Exploring Data Science Salaries Using R and the tidyverse

The use of ggplot2 in R for data visualization is a powerful tool for creating high-quality graphs that are both informative and aesthetically pleasing. In this article, we will explore the basics of using ggplot2 to create visualizations and demonstrate its capabilities through several examples.

One of the key features of ggplot2 is its ability to create simple yet effective line plots. To create a basic line plot, you can use the following code:

ggplot(data, aes(x = variable1, y = variable2)) +

geom_line()

This code creates a line plot with the specified x and y variables.

Another common type of graph is the box plot. Box plots are useful for comparing distributions of data across different groups or categories. To create a box plot using ggplot2, you can use the following code:

ggplot(data, aes(x = group, y = variable)) +

geom_boxplot()

This code creates a box plot with the specified group and variable.

When working with grouped data, it is often useful to compare distributions across different categories. To create a grouped box plot using ggplot2, you can use the following code:

ggplot(data, aes(x = group, y = variable)) +

geom_boxplot()

This code creates a box plot with the specified group and variable.

In addition to simple plots, ggplot2 also provides a wide range of more advanced visualization options. One example is the ability to create grouped box plots with multiple categories. To do this, you can use the following code:

ggplot(data, aes(x = location, y = salary)) +

geom_boxplot()

This code creates a box plot with the specified location and salary variables.

However, when working with grouped data, it is often useful to group the boxes based on specific categories. To do this, you can use the fill argument in combination with the aes function:

ggplot(data, aes(x = location, y = salary, fill = location)) +

geom_boxplot()

This code creates a box plot with the specified location and salary variables, and groups the boxes by location.

Another common visualization is the ability to filter data based on specific criteria. To do this, you can use the filter function in combination with the aes function:

ggplot(data, aes(x = variable1, y = variable2)) +

geom_line() +

filter(value > 10)

This code creates a line plot of the specified variables, but only includes data points where the value is greater than 10.

In addition to simple plots and filters, ggplot2 also provides a wide range of more advanced visualization options. One example is the ability to create grouped box plots with multiple categories and filtered data. To do this, you can use the following code:

ggplot(data, aes(x = location, y = salary)) +

geom_boxplot() +

filter(value > 10)

This code creates a box plot of the specified location and salary variables, groups the boxes by location, and only includes data points where the value is greater than 10.

Another common visualization is the ability to sort data based on specific criteria. To do this, you can use the reorder function in combination with the aes function:

ggplot(data, aes(x = reorder(location, median(salary)), y = salary)) +

geom_boxplot()

This code creates a box plot of the specified location and salary variables, sorts the locations by their median salaries, and plots them on the x-axis.

In addition to simple plots and filters, ggplot2 also provides a wide range of more advanced visualization options. One example is the ability to create grouped box plots with multiple categories, filtered data, and sorted locations. To do this, you can use the following code:

ggplot(data, aes(x = reorder(location, median(salary)), y = salary)) +

geom_boxplot() +

filter(value > 10)

This code creates a box plot of the specified location and salary variables, groups the boxes by location, sorts the locations by their median salaries, plots them on the x-axis, and only includes data points where the value is greater than 10.

Creating visualizations with ggplot2 can be a powerful way to communicate insights and trends in your data. By using a combination of simple plots, filters, and more advanced visualization options, you can create high-quality graphs that are both informative and aesthetically pleasing.

When working with real-world data, it's often useful to add additional details to your visualizations, such as titles, labels, and annotations. These elements can help to clarify the meaning of the data and make the graph more interpretable.

In addition to creating visualizations from scratch, ggplot2 also provides a wide range of pre-built functions for common data visualization tasks. For example, you can use the stat_summary function to create summary statistics such as means and medians, or the stat_pvalue function to calculate p-values for statistical tests.

When using ggplot2, it's essential to consider the aesthetics of your graph, including factors such as color, shape, size, and position. By choosing a consistent aesthetic, you can create a visually appealing graph that is easy to understand.

Finally, when working with real-world data, it's always a good idea to test and validate your visualizations to ensure they are accurate and reliable. This can involve checking the assumptions underlying your model, verifying the results of statistical tests, and iterating on your visualization based on feedback from stakeholders.

By following these tips and techniques, you can use ggplot2 to create high-quality visualizations that effectively communicate insights and trends in your data. Whether you're working with simple plots or more advanced visualization options, ggplot2 provides a powerful toolset for creating informative and aesthetically pleasing graphs.

In conclusion, the use of ggplot2 in R for data visualization is a powerful tool for creating high-quality graphs that are both informative and aesthetically pleasing. By using a combination of simple plots, filters, and more advanced visualization options, you can create visualizations that effectively communicate insights and trends in your data. Whether you're working with real-world data or creating visualizations from scratch, ggplot2 provides a powerful toolset for creating high-quality graphs that meet the needs of your stakeholders.

"WEBVTTKind: captionsLanguage: enthank you so much Channon for the opportunity to speak on your Channel today so in this video I will introduce you to the functions of the Tidy verse in our programming more precisely we will first download a data set online then we will import this data set using the read r package of the Tidy verse afterwards we will manipulate this data set using deer and then last but not least we will visualize this data set using the famous GG two package in R so without too much talk let's jump right into the AR code as a very first step we have to install and load the packages of the tidyverse as you can see in lines three and four of the code I have installed these packages already so for that reason I'm just going to load them with line four of the code so after running this line of code all functions of the Tidy verse are imported in our studio and we can now use them in our session as the next step we have to download the data set that we want to use in our example and in this video I will use a data set downloaded from kegle which is about data science jobs and salaries in 2024 you will find a link to this data set in the description of this video however on this page you can download the data set as you can see here and then you have to store the downloaded file in a file location path as you can see here here so in this case I have located the file in the directory Dropbox yok data sets and the file is called data science job listings now if you want to import this data set to our studio we first have to specify the file location path and as you have seen this data set is located in this folder so for that reason I specify a data object contain this file location path as a character string in a new data object so after running six of the code you can see at the top right of our studio that a new data object called my path is created and this data object contains our file location path in the next step we will use the function read uncore CSV which is provided by the read r package read R is one package of the Tidy work which is used for importing data into our programming and we combine this function with the stry function which is provided by the stringr package that's another package of the tidyverse and the stry function combines our path that we have specified before in line six of the code with the file name of our data set and then I'm storing the output of this in a new data object that is called my data so so after running lines eight and nine of the code you can see that at the top right of our studio a new data set called my data is appearing we can also have a look at this data set by clicking on the my data name of our data set at the top right of our studio so after clicking here you can see that a new window is opened which is showing the structure of our data so as you can see our data contains 500 rows and the column names position job title which is the name of the job company name the location of the job the salary range a date where the salary range was estimated the logo of the company and a link to the job as well as a company rating now in this analysis we are interested in only two of those columns namely location so where the chob is located as well as the salary and for that reason we perform a First Data manipulation step using the deer package of the tidyverse and from this package we apply the select function as you can see in lines 12 and 13 of the code and we apply this function to our data set we do this by using a pipe operator so the pipe operator specifies that the function that is used afterwards is applied to to the data that is specified before the pipe operator and then we are storing the output of this in a new data object which is again called my data and since this data object has the same name we are overwriting our previous data set so in other words our data set my data is updated after applying the select function and as you can see within the select function we specify that we want to keep only the column names location and salary so after running lines 11 to 13 of the code our data set is updated and you can see that by clicking on the data set at the top right once again then this window is opened again and you can see our updated data set which contains only the location and salary columns in the next step I also want to rename the column names of our data set because I want to work with lowercase letter as column names and for that reason I specify that I want to rename the column name location with an uppercase L to the new column name location with a lowercase L and the same for salary and to do this I used the rename function which is also provided by the deer package and as you can see the remaining syntax is similar to the syntax in lines 11 to13 the only difference is that we use a different function in which we have to specify different arguments so after running L 15 to 17 of the code our data set is updated once again so now you can see that the column names are in lowercase as a next step I want to replace certain values in our salary column by an a because if you have a closer look at this column you can see that most of the values are salary ranges however some of these values are per hour and I want to remove all these values from our data set because I want to work only with the rows where we have yearly salary estimates and not only salary estimates per hour and for that reason I specify that I want to replace all values in our column salary which contain the words per hour or the character string per hour by NA a and we do that by applying the Str detect function as you can see here so first we are detecting each value in salary which contains this character string and then we apply the if else function to specify that every time when this is detected we replace this value by an A and then we specify that we want to assign these values to the salary column so in other words we are overwriting or updating the salary column and we can do that using the mutate function which is also provided by the deer package as you can see in line 20 so after running lines 19 and 20 of the code our data is updated once again so as you can see now the values where we had hourly estimates instead of yearly estimates have been replaced by NA in the next step I also want to extract the numbers of these ranges and I want to calculate the mean value of these ranges because I want to convert the salary column to a numeric column and we can do that as you can see in lines 22 to 25 of the code so in these lines of code I first specify the mutate function so the mutate function is used once again to update our salary column as we already did in line 20 of the code however within the mutate function we specify this time that we want to use the extract all function we use this function to extract only the values in our ranges then in the next step we use the map function to convert these range values into numeric values and then we use the map double function in combination with the mean function to calculate the mean value of this range and then we replace the previous values in salary by this new output so after running these lines of code you can see that our data is updated once again and now we have converted the salary column into a numeric column which contains the mean values of our ranges now in the next step I also want to remove the rows which contain na values in the salary column and we can do that using the drop na function as you can see in NS 27 and 28 of the code once again we use a similar syntax so we replace our previous data object my data and we use the pipe operator to apply the drop na a function to our data object my data so after running these lines of code our data is updated once again and now you can see that we have removed all rows with enable values and you can also see that our data set now contains less rows because before it contained 500 rows and now only 381 rows are remaining so we removed 119 rows from our data set so until now we have modified the salary column however I also want to modify the location column because as you can see currently the location column contains the city and the state of the city and I want to analyze our data based on the states only and for that reason I want to remove the cities from our location column and we can do that as you can see in lines 30 and 31 of the code so once again we use the mutate function however this time we replace the location column and we specify using the sub function that we want to keep only the value after the comma in this column so in this case only the states so after running these lines of code you can see that our data is updated once again and now the location column contains only the states if you have a closer look at this column you can see that we also have values which tell us that these people work remotely and you also have other rows which contain the value United St States so for some these rows the specific location was not specified in the data set and I also want to remove these rows because I want to analyze only the rows in our data set where we actually know the location remote is also fine I also want to compare remote workers with people that work in certain States but I want to remove these rows which contain United States so as a next step I use the filter function and the filter function can be used to filter certain ropes in our data set and as I have explained before I want to keep only those locations which are unequal to the character string United States so after running lenses 33 and 34 of the code our data set is updated once again and as you can see there are no United States values anymore you can also see that now our data contains only 370 rows instead of the 500 rows in the beginning by the way if you would like to learn more about our programming data manipulation deep lier and the Tidy verse you might take a look at the courses at statistics Globe because soon we will start a new course on data manipulation in our programming using deer and the ti you will find a link to this course in the description of this video and now let's move on with this video so at this point we can move on from the data manipulation to the data visualization part and for data visualization we use the GG blot 2 package which is a very popular package and in my opinion the best package for data visualization that exists and in The Following part of this video we will draw different plots using the GG BL 2 package so one advantage of the GG BL 2 package is that it integrates seamlessly with other tidyverse packages such as deer so as you can see we can use it in combination with the pipe operator that we already used for our data manipulation tasks before and in this case we specify that we want to draw the values in our data set my data then we use the pipe operator and then we apply certain chg blot 2 functions such as GG blot a and Chom density and we apply these functions as the deer functions before by specifying the pipe operator and the ggplot functions are added on top of each other using the plus operator so the ggplot 2 package follows a so-called layout approach and in this case I want to draw a density plot of our celer column for that reason I specify within the GG blot function that the X Val values should be equal to our salary column and in order to draw a density blot I add the geom density layer on top so after running lens 36 to 38 of the code you can see at the bottom right of our studio that the density blot is appearing which is showing the distribution of the ceries in our data set and as you can see the peak of our density is at around 100,000 US dollar however you can also see that there's a smaller Peak at the right side of the density which shows that there's a group of very highly paid people you have already seen in the data manipulation part of this video that the syntax of the Tidy verse is always very similar and for that reason we can also use a similar syntax to draw different types of plots using the GG plot 2 package so for instance in the next example I want to draw a box plot of our data and the only thing that I have to change is that I use the Gom boxplot function instead of the Gom density function so lines 36 and 37 of the code are exactly the same as lines 40 and 41 so after running these lines of code you can see that another graph is appearing at the bottom right which is this time showing a box plot of our entire celery color now we can modify our code to create more advanced Graphics so in the next step I want to draw a grouped box plot and I want to group these boxes based on our location column and we can do that as you can see in lines 44 to 47 of the code and the only thing that I change in this part of the code is that I specify the fill argument and I specify that this argument should be equal to our location column and we can can run this code to see how this changes our graph at the bottom right so after running these lines of code you can see that a much more detailed graph is appearing you can also click the zoom button to enlarge this graph so after clicking this you can see that we have created a graph which contains a separate box for each of the states in our data set however you can also see that some boxes are very small so these boxes contain only one observation because for some states we have only very few observations and for that reason in the next step I want to perform an additional data manipulation step before drawing our graph more precisely I want to filter all locations where we have at least 10 observations so what I want to show you here is that you can also combine data manipulation steps with data visualization steps so as you can see this part of the code is exactly the same as this part of the code and this part is exactly the same as this part of the code however in lines 50 to 52 we specify additional data manipulation steps and we do this all in the same pipe so in these lines I first group our data by the location column then I count the number of occurrences in each location and I filter that I want to keep only those locations where we have at least 10 observations and then afterwards I ungroup our data and then I specify the GG plot functions that we already used before so after running Lines 49 to 55 the code you can see that another box plot is appearing at the bottom right we can enlarge this box plot clicking by clicking on the zoom button and then you can see that we have filtered for those locations where we have at least 10 observations so we have less boxes in our group box plot and you can also see that these boxes are more meaningful because they contain more observations now this graph is still difficult to read because we still have many different states available and for that reason I also want to sort these sta to visualize a sorted box blot and we can do that as you can see in lines 57 to 64 of the code and this code is basically exactly the same as the previous code that we have used in the previous example however there's only one additional line which is line 61 and in this line we use the mutate function and the fact reorder function to reorder the location column based on the median in the salary color so if we do that you can see that our graph is updated and as you can see now we have ordered our graph by median values so in this graph you can see that the highest median salary is paid in DC and the lowest median salary is paid in MD of course our data is limited so I'm not sure if this reflects the real salary distributions in the entire population but based on this data set we can draw this conclusion now in the last step of this video I also want to modify our graph because I want to add a title to our graph I want to remove the label on the x-axis I want to remove the legend title and I want to remove the values on the y-axis and once again this is exactly the same as the previous example and then we add further layers to this graph to modify it so after running these lines of code you can see that our graph is updated once again so after clicking on the zoom button you can see that now we have added a main title data science salary by States in thousands we have removed the y- AIS values we have removed the axis label on the x-axis and we have removed the title of the legend so now you could of course dive deeper into the functions of the ggplot 2 package and the Tidy verse to modify these data and the graph in even more detail but I hope that I was already able to demonstrate the strength of the tidyverse in our programming thanks again for the opportunity to speak on this channel and make sure to check out the video description of this video because there you will find a link to the more comprehensive course at statistics globe on the topic data manipulation in our programming using deer and the tidyverse you will find the code of this video and you will find further information about this topic thanks again and see you soon bye-byethank you so much Channon for the opportunity to speak on your Channel today so in this video I will introduce you to the functions of the Tidy verse in our programming more precisely we will first download a data set online then we will import this data set using the read r package of the Tidy verse afterwards we will manipulate this data set using deer and then last but not least we will visualize this data set using the famous GG two package in R so without too much talk let's jump right into the AR code as a very first step we have to install and load the packages of the tidyverse as you can see in lines three and four of the code I have installed these packages already so for that reason I'm just going to load them with line four of the code so after running this line of code all functions of the Tidy verse are imported in our studio and we can now use them in our session as the next step we have to download the data set that we want to use in our example and in this video I will use a data set downloaded from kegle which is about data science jobs and salaries in 2024 you will find a link to this data set in the description of this video however on this page you can download the data set as you can see here and then you have to store the downloaded file in a file location path as you can see here here so in this case I have located the file in the directory Dropbox yok data sets and the file is called data science job listings now if you want to import this data set to our studio we first have to specify the file location path and as you have seen this data set is located in this folder so for that reason I specify a data object contain this file location path as a character string in a new data object so after running six of the code you can see at the top right of our studio that a new data object called my path is created and this data object contains our file location path in the next step we will use the function read uncore CSV which is provided by the read r package read R is one package of the Tidy work which is used for importing data into our programming and we combine this function with the stry function which is provided by the stringr package that's another package of the tidyverse and the stry function combines our path that we have specified before in line six of the code with the file name of our data set and then I'm storing the output of this in a new data object that is called my data so so after running lines eight and nine of the code you can see that at the top right of our studio a new data set called my data is appearing we can also have a look at this data set by clicking on the my data name of our data set at the top right of our studio so after clicking here you can see that a new window is opened which is showing the structure of our data so as you can see our data contains 500 rows and the column names position job title which is the name of the job company name the location of the job the salary range a date where the salary range was estimated the logo of the company and a link to the job as well as a company rating now in this analysis we are interested in only two of those columns namely location so where the chob is located as well as the salary and for that reason we perform a First Data manipulation step using the deer package of the tidyverse and from this package we apply the select function as you can see in lines 12 and 13 of the code and we apply this function to our data set we do this by using a pipe operator so the pipe operator specifies that the function that is used afterwards is applied to to the data that is specified before the pipe operator and then we are storing the output of this in a new data object which is again called my data and since this data object has the same name we are overwriting our previous data set so in other words our data set my data is updated after applying the select function and as you can see within the select function we specify that we want to keep only the column names location and salary so after running lines 11 to 13 of the code our data set is updated and you can see that by clicking on the data set at the top right once again then this window is opened again and you can see our updated data set which contains only the location and salary columns in the next step I also want to rename the column names of our data set because I want to work with lowercase letter as column names and for that reason I specify that I want to rename the column name location with an uppercase L to the new column name location with a lowercase L and the same for salary and to do this I used the rename function which is also provided by the deer package and as you can see the remaining syntax is similar to the syntax in lines 11 to13 the only difference is that we use a different function in which we have to specify different arguments so after running L 15 to 17 of the code our data set is updated once again so now you can see that the column names are in lowercase as a next step I want to replace certain values in our salary column by an a because if you have a closer look at this column you can see that most of the values are salary ranges however some of these values are per hour and I want to remove all these values from our data set because I want to work only with the rows where we have yearly salary estimates and not only salary estimates per hour and for that reason I specify that I want to replace all values in our column salary which contain the words per hour or the character string per hour by NA a and we do that by applying the Str detect function as you can see here so first we are detecting each value in salary which contains this character string and then we apply the if else function to specify that every time when this is detected we replace this value by an A and then we specify that we want to assign these values to the salary column so in other words we are overwriting or updating the salary column and we can do that using the mutate function which is also provided by the deer package as you can see in line 20 so after running lines 19 and 20 of the code our data is updated once again so as you can see now the values where we had hourly estimates instead of yearly estimates have been replaced by NA in the next step I also want to extract the numbers of these ranges and I want to calculate the mean value of these ranges because I want to convert the salary column to a numeric column and we can do that as you can see in lines 22 to 25 of the code so in these lines of code I first specify the mutate function so the mutate function is used once again to update our salary column as we already did in line 20 of the code however within the mutate function we specify this time that we want to use the extract all function we use this function to extract only the values in our ranges then in the next step we use the map function to convert these range values into numeric values and then we use the map double function in combination with the mean function to calculate the mean value of this range and then we replace the previous values in salary by this new output so after running these lines of code you can see that our data is updated once again and now we have converted the salary column into a numeric column which contains the mean values of our ranges now in the next step I also want to remove the rows which contain na values in the salary column and we can do that using the drop na function as you can see in NS 27 and 28 of the code once again we use a similar syntax so we replace our previous data object my data and we use the pipe operator to apply the drop na a function to our data object my data so after running these lines of code our data is updated once again and now you can see that we have removed all rows with enable values and you can also see that our data set now contains less rows because before it contained 500 rows and now only 381 rows are remaining so we removed 119 rows from our data set so until now we have modified the salary column however I also want to modify the location column because as you can see currently the location column contains the city and the state of the city and I want to analyze our data based on the states only and for that reason I want to remove the cities from our location column and we can do that as you can see in lines 30 and 31 of the code so once again we use the mutate function however this time we replace the location column and we specify using the sub function that we want to keep only the value after the comma in this column so in this case only the states so after running these lines of code you can see that our data is updated once again and now the location column contains only the states if you have a closer look at this column you can see that we also have values which tell us that these people work remotely and you also have other rows which contain the value United St States so for some these rows the specific location was not specified in the data set and I also want to remove these rows because I want to analyze only the rows in our data set where we actually know the location remote is also fine I also want to compare remote workers with people that work in certain States but I want to remove these rows which contain United States so as a next step I use the filter function and the filter function can be used to filter certain ropes in our data set and as I have explained before I want to keep only those locations which are unequal to the character string United States so after running lenses 33 and 34 of the code our data set is updated once again and as you can see there are no United States values anymore you can also see that now our data contains only 370 rows instead of the 500 rows in the beginning by the way if you would like to learn more about our programming data manipulation deep lier and the Tidy verse you might take a look at the courses at statistics Globe because soon we will start a new course on data manipulation in our programming using deer and the ti you will find a link to this course in the description of this video and now let's move on with this video so at this point we can move on from the data manipulation to the data visualization part and for data visualization we use the GG blot 2 package which is a very popular package and in my opinion the best package for data visualization that exists and in The Following part of this video we will draw different plots using the GG BL 2 package so one advantage of the GG BL 2 package is that it integrates seamlessly with other tidyverse packages such as deer so as you can see we can use it in combination with the pipe operator that we already used for our data manipulation tasks before and in this case we specify that we want to draw the values in our data set my data then we use the pipe operator and then we apply certain chg blot 2 functions such as GG blot a and Chom density and we apply these functions as the deer functions before by specifying the pipe operator and the ggplot functions are added on top of each other using the plus operator so the ggplot 2 package follows a so-called layout approach and in this case I want to draw a density plot of our celer column for that reason I specify within the GG blot function that the X Val values should be equal to our salary column and in order to draw a density blot I add the geom density layer on top so after running lens 36 to 38 of the code you can see at the bottom right of our studio that the density blot is appearing which is showing the distribution of the ceries in our data set and as you can see the peak of our density is at around 100,000 US dollar however you can also see that there's a smaller Peak at the right side of the density which shows that there's a group of very highly paid people you have already seen in the data manipulation part of this video that the syntax of the Tidy verse is always very similar and for that reason we can also use a similar syntax to draw different types of plots using the GG plot 2 package so for instance in the next example I want to draw a box plot of our data and the only thing that I have to change is that I use the Gom boxplot function instead of the Gom density function so lines 36 and 37 of the code are exactly the same as lines 40 and 41 so after running these lines of code you can see that another graph is appearing at the bottom right which is this time showing a box plot of our entire celery color now we can modify our code to create more advanced Graphics so in the next step I want to draw a grouped box plot and I want to group these boxes based on our location column and we can do that as you can see in lines 44 to 47 of the code and the only thing that I change in this part of the code is that I specify the fill argument and I specify that this argument should be equal to our location column and we can can run this code to see how this changes our graph at the bottom right so after running these lines of code you can see that a much more detailed graph is appearing you can also click the zoom button to enlarge this graph so after clicking this you can see that we have created a graph which contains a separate box for each of the states in our data set however you can also see that some boxes are very small so these boxes contain only one observation because for some states we have only very few observations and for that reason in the next step I want to perform an additional data manipulation step before drawing our graph more precisely I want to filter all locations where we have at least 10 observations so what I want to show you here is that you can also combine data manipulation steps with data visualization steps so as you can see this part of the code is exactly the same as this part of the code and this part is exactly the same as this part of the code however in lines 50 to 52 we specify additional data manipulation steps and we do this all in the same pipe so in these lines I first group our data by the location column then I count the number of occurrences in each location and I filter that I want to keep only those locations where we have at least 10 observations and then afterwards I ungroup our data and then I specify the GG plot functions that we already used before so after running Lines 49 to 55 the code you can see that another box plot is appearing at the bottom right we can enlarge this box plot clicking by clicking on the zoom button and then you can see that we have filtered for those locations where we have at least 10 observations so we have less boxes in our group box plot and you can also see that these boxes are more meaningful because they contain more observations now this graph is still difficult to read because we still have many different states available and for that reason I also want to sort these sta to visualize a sorted box blot and we can do that as you can see in lines 57 to 64 of the code and this code is basically exactly the same as the previous code that we have used in the previous example however there's only one additional line which is line 61 and in this line we use the mutate function and the fact reorder function to reorder the location column based on the median in the salary color so if we do that you can see that our graph is updated and as you can see now we have ordered our graph by median values so in this graph you can see that the highest median salary is paid in DC and the lowest median salary is paid in MD of course our data is limited so I'm not sure if this reflects the real salary distributions in the entire population but based on this data set we can draw this conclusion now in the last step of this video I also want to modify our graph because I want to add a title to our graph I want to remove the label on the x-axis I want to remove the legend title and I want to remove the values on the y-axis and once again this is exactly the same as the previous example and then we add further layers to this graph to modify it so after running these lines of code you can see that our graph is updated once again so after clicking on the zoom button you can see that now we have added a main title data science salary by States in thousands we have removed the y- AIS values we have removed the axis label on the x-axis and we have removed the title of the legend so now you could of course dive deeper into the functions of the ggplot 2 package and the Tidy verse to modify these data and the graph in even more detail but I hope that I was already able to demonstrate the strength of the tidyverse in our programming thanks again for the opportunity to speak on this channel and make sure to check out the video description of this video because there you will find a link to the more comprehensive course at statistics globe on the topic data manipulation in our programming using deer and the tidyverse you will find the code of this video and you will find further information about this topic thanks again and see you soon bye-bye\n"

Exploring Data Science Salaries Using R and the tidyverse

Random Videos