Intro to statistics with R - Histograms and distributions in R

A Histogram: A Simple yet Informative Graph in Statistics

A histogram is one of the simplest graphs that we use in statistics, but it's very useful and very informative. It's just a type of graph used to display a distribution. One might think why start with histograms when there are lots of fancier graphics that we can start with, but a histogram is nice because it helps us to overcome our natural tendency to rely on summary statistics.

This tendency is a natural thing as an analogy. Think of stereotyping - it's very easy to stereotype individuals who are part of a group because you can rely on your sort of summary statistics of that group to make an inference about an individual. However, this approach should not be taken when we want to get to know individuals within a group. We need to look at the entire distribution, and look at everybody within the group before we calculate those summary statistics or jump to those conclusions. The summary conclusions about a group are based on a certain level of generalization.

To illustrate this concept, let's consider an example. In this case, the variable being plotted is body temperature measured in degrees Fahrenheit. On the x-axis, we plot the variable, and on the y-axis, we have frequency - the number of people in this distribution who have this particular body temperature. This example will be repeated throughout the course with a slight variation: one time it'll be done in Celsius.

We know that normal body temperature is about 98.6 degrees Fahrenheit. So if we look at where 98.6 degrees is, it's right about here. Yes, it's roughly around this point, and it's slightly above average; the group in this example runs a little hot. The distribution depicted is a nice normal distribution - characteristic of a normal distribution.

The way to spot a normal distribution is by looking for that signature bell-shaped curve. It has this nice shape over it with the mean or average symmetrical around it, so we can see there's just about an equal number of cases below and above the mean. This symmetry is what defines a normal distribution. We could easily draw a smooth curve over the histogram to represent the underlying data.

Now that we've looked at this example in Fahrenheit, let's do the same but with Celsius, as we want our audience from around the world to be able to understand it. In terms of degrees Celsius, the temperature is slightly lower than 98.6 degrees Fahrenheit, and again, this group runs a little hot compared to what most people go with. The data looks the same in both cases, as expected.

We can plot these distributions using software like R - we have all used R before. You may have noticed that it gives you slightly different visuals depending on how you use it. In our case, we've plotted two distributions - one for Fahrenheit and another for Celsius - but they're based on the same data points. We adjust the bin width using a function called hist in R, which allows us to customize this further if needed.

In conclusion, a histogram is an effective tool in statistics that helps us understand the distribution of our data without falling into the trap of relying solely on summary statistics. By visualizing these distributions, we can better grasp how they work and make informed decisions based on actual data rather than generalizations about groups.

"WEBVTTKind: captionsLanguage: enso a histogram is one of the simplest graphs that we use in statistics but they're very useful and very informative so it's just a type of graph used to display a distribution so you might think why start with histograms there are lots of fancier graphics that we can start with but a histogram is nice because it helps us to overcome our sort of natural tendency to rely on summary statistics and this is just a natural thing as an analogy think of just stereotyping it's very easy to stereotype individuals who are part of a group because you can rely on your sort of summary statistics of that group to make an inference about an individual but as we know we shouldn't do that if we want to get to know individuals within a group we want to look at the entire distribution we want to look at everybody within the group before we calculate those summary statistics or jump to those conclusions the summary conclusions about a group so here's a quick example of a nice normal distribution plotted in a histogram so we're going to look at a lot of histograms in this segment and throughout the course they'll always take on the same form so on the x-axis i'm plotting whatever variable it is that i'm looking at in this case it's the example is body temperature measured in degrees fahrenheit and on the y-axis all i have is frequency it's just the number of people in this distribution in this sample that have this particular body temperature so in a second i'll do this in celsius because i know we have an international audience but in in fahrenheit normal body temperature is about 98.6 degrees that's what most people go with so if you look where 98.6 degrees is right about here so yeah it's right about the average this this group runs a little hot you might say so that's a nice normal distribution a characteristic of a normal distribution is if i wanted to sort of smooth this i could draw a curve over it it has this nice bell-shaped curve to it and the way to spot a normal distribution is to look for that signature bell-shaped curve and it's symmetrical around the mean or around the average so you can see there's just the the number of cases beyond the mean is about equal to the number of cases below the mean so it's symmetrical and has this nice normal bell shaped curve here's the same exact data just plotted in terms of celsius so now on the x-axis you see we're just plotting temperature measured in degrees celsius again it's a normal distribution we did these graphics in r and you you've seen r a little bit and r just bends these differently so you can see the width of these bins are different and again you can set that if you remember using the r function hist and the argument breaks you can change that if you like but this is just the same exact uh data plotted in in celsius and instead of in fahrenheitso a histogram is one of the simplest graphs that we use in statistics but they're very useful and very informative so it's just a type of graph used to display a distribution so you might think why start with histograms there are lots of fancier graphics that we can start with but a histogram is nice because it helps us to overcome our sort of natural tendency to rely on summary statistics and this is just a natural thing as an analogy think of just stereotyping it's very easy to stereotype individuals who are part of a group because you can rely on your sort of summary statistics of that group to make an inference about an individual but as we know we shouldn't do that if we want to get to know individuals within a group we want to look at the entire distribution we want to look at everybody within the group before we calculate those summary statistics or jump to those conclusions the summary conclusions about a group so here's a quick example of a nice normal distribution plotted in a histogram so we're going to look at a lot of histograms in this segment and throughout the course they'll always take on the same form so on the x-axis i'm plotting whatever variable it is that i'm looking at in this case it's the example is body temperature measured in degrees fahrenheit and on the y-axis all i have is frequency it's just the number of people in this distribution in this sample that have this particular body temperature so in a second i'll do this in celsius because i know we have an international audience but in in fahrenheit normal body temperature is about 98.6 degrees that's what most people go with so if you look where 98.6 degrees is right about here so yeah it's right about the average this this group runs a little hot you might say so that's a nice normal distribution a characteristic of a normal distribution is if i wanted to sort of smooth this i could draw a curve over it it has this nice bell-shaped curve to it and the way to spot a normal distribution is to look for that signature bell-shaped curve and it's symmetrical around the mean or around the average so you can see there's just the the number of cases beyond the mean is about equal to the number of cases below the mean so it's symmetrical and has this nice normal bell shaped curve here's the same exact data just plotted in terms of celsius so now on the x-axis you see we're just plotting temperature measured in degrees celsius again it's a normal distribution we did these graphics in r and you you've seen r a little bit and r just bends these differently so you can see the width of these bins are different and again you can set that if you remember using the r function hist and the argument breaks you can change that if you like but this is just the same exact uh data plotted in in celsius and instead of in fahrenheit\n"