Welcome to the First Course in Data Chemists and Visualization with ggplot2 Series
I'm Rick Skeleton, and I'll be your instructor for this series. I've been training scientists on how to better understand and visualize their data since 2012, and I'm excited to bring my experience to Data Camp. What is data visualization? Data visualization is an essential skill for data scientists that combines statistics and design in meaningful and appropriate ways. On one hand, it's a form of graphical data analysis emphasizing accurate representation and interpretation of data. On the other hand, database relies on good design choices not only to make our plots attractive but also to aid both the understanding and communication of results. At its heart, data visualization is a form of visual communication that involves some amount of trial and error.
Understanding the Distinction Between Exploratory and Explanatory Visualizations
As a data scientist, it's essential that you can quickly explore data, but you'll also be tasked with explaining your results to stakeholders. Good design begins with thinking about the audience, and sometimes that just means ourselves. There are two types of visualizations: exploratory and explanatory. Exploratory visualizations are easily generated and intended for a small specialist audience, such as yourself and your colleagues. Their primary purpose is graphical data analysis. On the other hand, explanatory visualizations are labor-intensive and specific to a broader audience, such as in publications or presentations, and are part of the communications process.
A Real-World Example: Understanding the Relationship Between Brain and Body Weights
Let's take a look at this data set that contains the average brain and body weights of 62 line mammals. The most obvious first step is to make a scatter plot like this one. Two mammals, the African and Asian elephants, have both very large brain and body weights leading to a positive skew on both axes. Applying a linear model is a poor choice since a few extreme values have a large influence. A lot of transformation of both variables allows for a better fit. Although we begin with a rough exploratory plot that informed us about our data and led us to a meaningful result, in the end, we'd probably want a cleaned up explanatory plot.
A Classic Example from Francis Ants Cone
Here's a classic example from Francis ants cone, first published in 1973. When we imagine a linear model as presented in this anonymous plot, we imagine that we are describing data that looks something like this. But this same model could be described in a very different set of data such as a parabolic relationship which calls for a different model or data in which an extreme value has a large effect which becomes clear when the outlier is removed. Sometimes, the model may be described in a relationship where in fact there is none at all because some extreme values may be incorrect if we relied solely on the numeric output without plotting our data, we'd have missed distinct and interesting underlying trends.
The Creative Process of Data Visualization
We can see that data visualization is rooted in statistics and graph called data analysis but it's also a creative process that involves some amount of trial and error. Alright enough examples, let's get our fingers moving withggplot2! I'm excited to bring my experience to Data Camp and help you learn how to better understand and visualize your data with ggplot2. In this series, we'll cover the basics of ggplot2 and how to create visualizations that are both informative and aesthetically pleasing. Let's get started on our journey to become proficient in data visualization with ggplot2!
"WEBVTTKind: captionsLanguage: enhi and welcome to the first course in data chemists and visualization with ggplot2 series my name is Rick skeleton and I'll be the instructor for this series I've been training scientists on how to better understand and visualize our data since 2012 I'm very excited to bring my experience to data camp so what is that abyss data visualization is an essential skill for data scientist it combines statistics and design in meaningful and appropriate ways on the one hand that it is a form of graphical data analysis emphasizing accurate representation and interpretation of data on the other hand database relies on good design choices not only to make our plants attractive but to also aid both the understanding and communication of results on top of that there is an element of creativity since at its heart data base is a form of visual communication it's important to understand the distinction between exploratory and explanatory visualizations exploratory visualizations are easily generated that a heavy and intended for a small specialist audience for example yourself and your colleagues their primary purpose is graphical data analysis explanatory visualizations are labor-intensive that a specific and intended for a broader audience for example in publications or presentations they are part of the communications process as a data scientist it's essential that you can quickly explore data but you'll also be tasked with explaining your results the stakeholders good design begins with thinking about the audience and sometimes that just means ourselves this data set contains the average brain and body weights of 62 line mammals to understand the relationship here the most obvious first step is to make a scatter plot like this one two mammals the African and Asian elephants have both very large brain and body weights leading to a positive skew on both axes here applying a linear model is a poor choice since a few extreme values have a large influence a lot of transformation of both variables allows for a better fit so although we begin with a rough exploratory plot that informed us about our data and led us to a meaningful result in the end we'd probably want a cleaned up explanatory plot here's a classic example from Francis ants cone first published in 1973 when we imagine a linear model as presented in this anonymous plot we imagine that we are describing data that looks something like this but this same model could be described in a very different set of data such as a parabolic relationship which calls for a different model or data in which an extreme value has a large effect which becomes clear when the outlier is removed and sometimes the model may be described in a relationship where in fact there is none at all because some extreme values may be incorrect if we relied solely on the numeric output without plotting our data we'd have missed distinct and interesting underlying trends we can see that database is rooted in statistics and graph called data analysis but it's also a creative process that involves some amount of trial and error alright enough examples let's get our fingers moving withhi and welcome to the first course in data chemists and visualization with ggplot2 series my name is Rick skeleton and I'll be the instructor for this series I've been training scientists on how to better understand and visualize our data since 2012 I'm very excited to bring my experience to data camp so what is that abyss data visualization is an essential skill for data scientist it combines statistics and design in meaningful and appropriate ways on the one hand that it is a form of graphical data analysis emphasizing accurate representation and interpretation of data on the other hand database relies on good design choices not only to make our plants attractive but to also aid both the understanding and communication of results on top of that there is an element of creativity since at its heart data base is a form of visual communication it's important to understand the distinction between exploratory and explanatory visualizations exploratory visualizations are easily generated that a heavy and intended for a small specialist audience for example yourself and your colleagues their primary purpose is graphical data analysis explanatory visualizations are labor-intensive that a specific and intended for a broader audience for example in publications or presentations they are part of the communications process as a data scientist it's essential that you can quickly explore data but you'll also be tasked with explaining your results the stakeholders good design begins with thinking about the audience and sometimes that just means ourselves this data set contains the average brain and body weights of 62 line mammals to understand the relationship here the most obvious first step is to make a scatter plot like this one two mammals the African and Asian elephants have both very large brain and body weights leading to a positive skew on both axes here applying a linear model is a poor choice since a few extreme values have a large influence a lot of transformation of both variables allows for a better fit so although we begin with a rough exploratory plot that informed us about our data and led us to a meaningful result in the end we'd probably want a cleaned up explanatory plot here's a classic example from Francis ants cone first published in 1973 when we imagine a linear model as presented in this anonymous plot we imagine that we are describing data that looks something like this but this same model could be described in a very different set of data such as a parabolic relationship which calls for a different model or data in which an extreme value has a large effect which becomes clear when the outlier is removed and sometimes the model may be described in a relationship where in fact there is none at all because some extreme values may be incorrect if we relied solely on the numeric output without plotting our data we'd have missed distinct and interesting underlying trends we can see that database is rooted in statistics and graph called data analysis but it's also a creative process that involves some amount of trial and error alright enough examples let's get our fingers moving with\n"