Intro to statistics with R - Wrapping up Measures of Variability

The Importance of Variability in Sports Statistics: A Case Study of Jeremy Lin's Points Per Game

In sports, statistics are used to measure an athlete's performance and determine their quality. One common statistic is points per game, which is the average number of points scored by a player over a certain period of time. However, it is also important to consider variability in this statistic, as it can give insight into how consistent an athlete is from one game to another.

To understand variability, we must first calculate the mean. The mean is calculated by summing up all the values and dividing by the number of values. In the case of Jeremy Lin's points per game, the sum is 227, and since there are 10 games, the mean is 22.7 points per game.

However, variability is what really matters when it comes to sports statistics. To calculate variability, we need to find the deviation score for each game. The deviation score is calculated by subtracting the average from the individual value. For example, in Jeremy Lin's first game, he scored 28 points. If his average was 22.7, then his deviation score would be 28 - 22.7 = 5.3.

To get a better understanding of variability, we need to calculate the sum of squared deviations, which is also known as variance. The variance is calculated by squaring each deviation score and summing them up. In Jeremy Lin's case, the sum of squared deviations is 922.1.

The standard deviation is a measure of the spread or dispersion of the data from the mean. It is calculated by taking the square root of the variance. In this case, the standard deviation is √922.1 = 30.3 points per game. This means that on average, Jeremy Lin scored about 10 points more than his mean in any given game.

However, it's worth noting that this calculation includes a single game where he only scored two points. If we exclude this game from the calculation, his standard deviation would be lower, indicating that he is more consistent when he starts.

Using proper notation, we can represent these statistics as follows:

m: mean

sd: standard deviation

sd²: variance

The formula for variance is:

sd² = (Σ(xi - m)²) / n

where xi represents each individual value, and n represents the number of values. In this case, the sum of squared deviations is 922.1, which can be represented as:

sd² = 922.1

Taking the square root of both sides gives us the standard deviation:

sd = √922.1 ≈ 30.3 points per game

"WEBVTTKind: captionsLanguage: enon to an example this is one of my favorite examples i developed last year because this happened in in new york city which is where i live and it happened in professional basketball in the u.s in the nba this is a guy called jeremy lin who was not that big of a star not a star at all actually he's not really a top basketball player in the u.s but all of a sudden he got the chance to play for the new york knicks so there's jeremy lin in madison square garden that's right there in the middle of manhattan just a couple of blocks from where i live so i got caught up in what the media called linsanity jeremy lin suddenly took the nba by storm because he started scoring a lot of points so what we're going to do is we're going to look at jeremy lin's points per game over that time it was during the winter of 2012. so what i can do is i can look at jeremy lin's points per game across just let's just pick out a 10 game stretch of that season in 2012 and if you start at the bottom actually you'll see there's just two points per game in that one game and that was typical of what jeremy lin was doing before the one of the star players on the knicks was hurt and jeremy lin had the opportunity to play so carmelo anthony who's one of the stars of the new york knicks he was hurt jeremy lin got the opportunity to start and his first opportunity to start he scored 25 points and then the next time he he he started he scored 28 points and so on he scored 23 and i think it was around here when he scored 38 that lynn sanity totally set in in new york city and we got all excited we thought the knicks were going to win the nba championship never happened not happening again today as i film this the nba game seven championship is happening tonight uh here in the u.s and the knicks aren't in it um but back to the the topic of of this lecture um we have jeremy lin's points per game across 10 games we can get summary statistics from this right so we can get what was his average across this this segment or this section of the season so what was his average across these 10 games and that's a common statistic in a lot of sports so points per game or say goals per season if you're looking at football internationally or hockey here in the u.s is also big right now so we could get just average points per game but i also might want to know and it's an important statistic to look at if you're looking at sort of the quality of players is how variable are these players across a certain stretch of games so was jeremy lin consistent across these 10 games or was he sort of all over the place having a good night one night a bad night the next night so first let's calculate the mean that's just the sum of all the scores divided by n and i did that down at the bottom of the screen here to get the mean i just summed that column of points per game the sum of that column is 227 we're averaging over 10 games so it's 227 over 10 or 22.7 so during that 10 game stretch jeremy lin is averaging 22.7 points per game that's the mean that's a measure of central tendency but we want to know about his variability over those 10 games so the way to calculate variability is variance and standard deviation the first thing we want to do is for each game we want to calculate what's known as a deviation score and that's this column these are deviation scores so to get a deviation score i just take how many points did jeremy jeremy lin score in one game and compare it to his average so in this first row i can take 28 and subtract his average 22.7 and i get 5.3 so he was deviating from his average about 5.3 he was 5.3 above average that night and so on i can calculate each deviation score the whole point of variability is getting sort of an average deviation now i can't take the average of this column because it sums to zero so if i sum that column i would get zero and divide by ten i'd have zero so instead what we'll do is we'll square each deviation score that's this last column is the deviation scores squared that gets rid of sine so then when i sum the squared deviation scores i can divide by n and that gives me variance so this last column i've just taken each deviation score squared it and then sum all the squared deviation scores to get 922.1 if i divide by 10 that gives me 92.2 so to summarize those results his average points per game our mean was 22.7 the variance standard deviation squared was that 92.21 it was the sum of squared deviations divided by n for short we're going to call that sum of squares divided by n if we take the square root of that then we get the standard deviation 9.6 by taking the square root that brings us back to the units that we started in remember we had to square everything to get rid of sine when we went from deviation scores to square deviation scores so this last step we take the square root that brings us back to the units we were in which is points per game what this is saying is on average he was fluctuating about 10 points per game is that really accurate not really he wasn't really fluctuating about 10 points per game once he had the opportunity to start right the standard deviation is 9.6 because i included that one game where he only scored two points he didn't start that game if we don't include that then his standard deviation will go down before i do that i just want to point out some notation again i'm going to use m for mean sd for standard deviation sd squared for variance that's also known as mean squares for the mean sum of square or mean squared deviations so mean squares sums of squares these are important definitions to know going forwardon to an example this is one of my favorite examples i developed last year because this happened in in new york city which is where i live and it happened in professional basketball in the u.s in the nba this is a guy called jeremy lin who was not that big of a star not a star at all actually he's not really a top basketball player in the u.s but all of a sudden he got the chance to play for the new york knicks so there's jeremy lin in madison square garden that's right there in the middle of manhattan just a couple of blocks from where i live so i got caught up in what the media called linsanity jeremy lin suddenly took the nba by storm because he started scoring a lot of points so what we're going to do is we're going to look at jeremy lin's points per game over that time it was during the winter of 2012. so what i can do is i can look at jeremy lin's points per game across just let's just pick out a 10 game stretch of that season in 2012 and if you start at the bottom actually you'll see there's just two points per game in that one game and that was typical of what jeremy lin was doing before the one of the star players on the knicks was hurt and jeremy lin had the opportunity to play so carmelo anthony who's one of the stars of the new york knicks he was hurt jeremy lin got the opportunity to start and his first opportunity to start he scored 25 points and then the next time he he he started he scored 28 points and so on he scored 23 and i think it was around here when he scored 38 that lynn sanity totally set in in new york city and we got all excited we thought the knicks were going to win the nba championship never happened not happening again today as i film this the nba game seven championship is happening tonight uh here in the u.s and the knicks aren't in it um but back to the the topic of of this lecture um we have jeremy lin's points per game across 10 games we can get summary statistics from this right so we can get what was his average across this this segment or this section of the season so what was his average across these 10 games and that's a common statistic in a lot of sports so points per game or say goals per season if you're looking at football internationally or hockey here in the u.s is also big right now so we could get just average points per game but i also might want to know and it's an important statistic to look at if you're looking at sort of the quality of players is how variable are these players across a certain stretch of games so was jeremy lin consistent across these 10 games or was he sort of all over the place having a good night one night a bad night the next night so first let's calculate the mean that's just the sum of all the scores divided by n and i did that down at the bottom of the screen here to get the mean i just summed that column of points per game the sum of that column is 227 we're averaging over 10 games so it's 227 over 10 or 22.7 so during that 10 game stretch jeremy lin is averaging 22.7 points per game that's the mean that's a measure of central tendency but we want to know about his variability over those 10 games so the way to calculate variability is variance and standard deviation the first thing we want to do is for each game we want to calculate what's known as a deviation score and that's this column these are deviation scores so to get a deviation score i just take how many points did jeremy jeremy lin score in one game and compare it to his average so in this first row i can take 28 and subtract his average 22.7 and i get 5.3 so he was deviating from his average about 5.3 he was 5.3 above average that night and so on i can calculate each deviation score the whole point of variability is getting sort of an average deviation now i can't take the average of this column because it sums to zero so if i sum that column i would get zero and divide by ten i'd have zero so instead what we'll do is we'll square each deviation score that's this last column is the deviation scores squared that gets rid of sine so then when i sum the squared deviation scores i can divide by n and that gives me variance so this last column i've just taken each deviation score squared it and then sum all the squared deviation scores to get 922.1 if i divide by 10 that gives me 92.2 so to summarize those results his average points per game our mean was 22.7 the variance standard deviation squared was that 92.21 it was the sum of squared deviations divided by n for short we're going to call that sum of squares divided by n if we take the square root of that then we get the standard deviation 9.6 by taking the square root that brings us back to the units that we started in remember we had to square everything to get rid of sine when we went from deviation scores to square deviation scores so this last step we take the square root that brings us back to the units we were in which is points per game what this is saying is on average he was fluctuating about 10 points per game is that really accurate not really he wasn't really fluctuating about 10 points per game once he had the opportunity to start right the standard deviation is 9.6 because i included that one game where he only scored two points he didn't start that game if we don't include that then his standard deviation will go down before i do that i just want to point out some notation again i'm going to use m for mean sd for standard deviation sd squared for variance that's also known as mean squares for the mean sum of square or mean squared deviations so mean squares sums of squares these are important definitions to know going forward\n"