Spreadsheets Tutorial - How far from average
# Understanding Data Distribution: Variance, Standard Deviation, Percentiles, and Quartiles
## Introduction
In this guide, we will explore how to measure a data point's distance from the average, focusing on key statistical concepts such as variance, standard deviation, percentiles, and quartiles. These metrics are essential for understanding how data is spread and distributed.
---
## Understanding Variance
Variance measures how dispersed a dataset is from its mean. A smaller variance indicates that the data points are closely clustered around the mean, while a larger variance suggests greater dispersion among the data points.
### Example: Calculating Variance
Consider three columns of data:
- **Column A**: [10, 10, 10, 10] (variance = 0)
- **Column B**: [10, 14, 10, 10] (variance = 3)
- **Column C**: [10, 14, 10, 100] (variance is the highest among the three)
To calculate variance:
1. Calculate the mean:
\[
\text{Mean} = \frac{10 + 14 + 10 + 10}{4} = 11
\]
2. Subtract the mean from each value:
- For the first, third, and fourth values: \(10 - 11 = -1\)
- For the second value: \(14 - 11 = 3\)
3. Square all these differences:
- \((-1)^2 = 1\)
- \(3^2 = 9\)
4. Take another average of the squared differences:
\[
\text{Variance} = \frac{1 + 9 + 1 + 1}{4} = \frac{12}{4} = 3
\]
---
## Using Formulas for Variance and Standard Deviation
Thankfully, there are formulas to simplify these calculations:
- Use `VAR.P` to calculate variance. For example:
\[
\text{Variance} = \text{VAR.P}(10, 14, 10, 10) = 3
\]
- To make the results more intuitive (since variance is in squared units), take the square root of the variance to get the standard deviation. Use `STDEV.P` for this purpose:
\[
\text{Standard Deviation} = \sqrt{\text{Variance}} \quad \text{or} \quad \text{STDEV.P}(10, 14, 10, 10) = 1.73
\]
---
## Standard Scores and Their Interpretation
A standard score (z-score) shows how many standard deviations a data point is away from the mean.
- **Mean**: 11
- **Standard Deviation**: 1.73
For example, consider a new data point: 12.73. To find its standard score:
\[
\text{Standard Score} = \frac{(12.73 - 11)}{1.73} = 1
\]
This means the new data point is exactly one standard deviation away from the mean.
---
## Exploring Percentiles and Quartiles
Percentiles help you understand where a particular value stands in a dataset by showing the percentage of values below that point.
### Example: A Histogram with 1,000,000 Values
- **Blue Line (Average = 0)**: This is the 50th percentile because half the points are less than zero and half are greater.
- **Red Line (-0.67)**: This represents the first quartile (25th percentile), meaning 25% of the data is less than or equal to -0.67.
- **Green Line (0.67)**: This represents the third quartile (75th percentile), meaning 75% of the data is less than or equal to 0.67.
### Calculating Quartiles in Sheets
Use the `QUARTILE` function:
\[
\text{First Quartile} = \text{QUARTILE}(array, 1) \\
\text{Second Quartile (Median)} = \text{QUARTILE}(array, 2) \\
\text{Third Quartile} = \text{QUARTILE}(array, 3) \\
\text{Fourth Quartile} = \text{QUARTILE}(array, 4)
\]
For example:
- **First Quartile**: 234
- **Second Quartile (Median)**: 456
- **Third Quartile**: 567
- **Fourth Quartile**: 789
---
## Conclusion
Understanding variance, standard deviation, percentiles, and quartiles is crucial for analyzing data distributions. These metrics help you make sense of how data points are spread and positioned relative to each other.
By practicing these concepts with exercises based on U.S. train ridership data, you can gain hands-on experience in exploring how ridership varies over time.