R Tutorial - Counts vs. proportions

The Power of Conditional Proportions: Unlocking Insights from Raw Data

In data analysis, raw accounts of cases can be fascinating, but often it's the proportions that reveal the most interesting patterns. Computing these proportions manually can be challenging, but using functions like `prop` can make it easier. To illustrate this concept, let's return to our table of counts of cases by identity and alignment. If we want to get a sense of the proportion of all cases that fell into each category, we can take the original table of counts saved as `tab_underscore_CNT` and provide it as input to the `prop` function.

The Single Largest Category: Characters That Are Bad and Secret

Upon using the `prop` function with our table of counts, we see that the single largest category is characters that are bad and secret, accounting for approximately 29% of all characters. This highlights the importance of understanding the proportions within each category. Moreover, note that since these are all proportions out of the whole data set, the sum of all these proportions equals 1.

Conditional Proportions: A New Perspective

When examining systematic associations between variables, conditional proportions become an essential tool. By conditioning on rows or columns, we can gain new insights into the relationships within our data. One example of a conditional proportion is the proportion of public identity characters that are good. To build this table of conditional proportions, we add 1 as the second argument to specify that we'd like to condition on the rows.

The Impact of Conditioning on Rows

When conditioning on rows, we observe that around 57% of all secret characters are bad. This is a striking result, highlighting the potential biases in our data when conditioning on identity. Furthermore, note that since we're conditioning on rows, the sum now equals 1, providing us with a new perspective on the relationships within our data.

Conditional Proportions: Conditioning on Columns

To examine the associations between columns, we can change the argument to 2, specifying that we'd like to condition on the columns. In this case, the proportion of bad characters that are secret is around 63%. This reveals a different picture compared to conditioning on identity, demonstrating the importance of considering multiple perspectives when analyzing data.

Making Sense of Data Using Graphics

As the number of cells in our tables increases, it becomes much easier to make sense of our data using graphics. A bar chart remains an excellent choice but may require adjustments to better represent the proportions. In this case, we add the `position=fill` option to the plot to ensure that each bar adds up to a total proportion of 1.

Conditioning on Alignment: A Different Perspective

Let's explore what happens when we condition instead on alignment. The only change needed in the code is to swap the positions of the names of the variables, resulting in a new plot where we've conditioned on alignment. To our surprise, within characters that are bad, the greatest proportion of those are indeed secret. This outcome might seem paradoxical at first but is actually a result of having different numbers of cases in each level.

Experimenting with Conditional Proportions

Now it's your turn to experiment with conditional proportions! Try conditioning on different variables and exploring the resulting plots. Remember that understanding proportions within each category can reveal hidden patterns and biases in your data, leading to new insights and perspectives.

"WEBVTTKind: captionsLanguage: enyou may have noticed in the last exercises that sometimes raw accounts of cases can be useful but often it's the proportions that are more interesting we can do our best to compute those proportions in our head or we could do it explicitly let's return to our table of counts of cases by identity and alignment if we wanted to instead get a sense of the proportion of all cases that fell into each category we can take the original table of counts saved as tab underscore CNT and provided as input to the prop table function we see here that the single largest category is characters that are bad and secret at about 29 percent of characters also note that because these are all proportions out of the whole data set the sum of all these proportions is 1 if we're curious about the systematic associations between variables we should look to conditional proportions an example of a conditional proportion is the proportion of public identity characters that are good to build a table of these conditional proportions add a 1 as the second argument specifying that you'd like to condition on the rows we see here that around 57% of all secret characters are bad because we're conditioning on identity it's every road that now sums to 1 to condition on the columns instead you can change that argument to 2 now it's the columns that sum to 1 and we learn for example that the proportion of bad characters that are secret is around 63% as the number of cells in these tables gets large it becomes much easier to make sense of your data using graphics the bar chart is still a good choice but we're going to need to add some options here's the code for the bar chart based on comps we want to condition on whatever's on the x-axis and then stretch those bars to each add up to a total proportion of 1 so we add the position equals fill option to the Giambi our function let's add one additional layer it changed to our y-axis to indicate that we're looking at proportions when we run this code at the console we get a plot that reflects our table of proportions after we had conditioned on ID while the proportion of secret characters that are bad is still large it actually it's actually less than the proportion of bad care ders and those that are listed as unknown we get a very different picture if we condition instead on alignment the only change needed in the code is to swap the positions of the names of the variables this results in a plot where we've conditioned on alignment and we learned that within characters that are bad the greatest proportion of those are indeed secret this might seem paradoxical but it's just a result of having different numbers of cases in each level okay now you try experimenting with conditional proportionsyou may have noticed in the last exercises that sometimes raw accounts of cases can be useful but often it's the proportions that are more interesting we can do our best to compute those proportions in our head or we could do it explicitly let's return to our table of counts of cases by identity and alignment if we wanted to instead get a sense of the proportion of all cases that fell into each category we can take the original table of counts saved as tab underscore CNT and provided as input to the prop table function we see here that the single largest category is characters that are bad and secret at about 29 percent of characters also note that because these are all proportions out of the whole data set the sum of all these proportions is 1 if we're curious about the systematic associations between variables we should look to conditional proportions an example of a conditional proportion is the proportion of public identity characters that are good to build a table of these conditional proportions add a 1 as the second argument specifying that you'd like to condition on the rows we see here that around 57% of all secret characters are bad because we're conditioning on identity it's every road that now sums to 1 to condition on the columns instead you can change that argument to 2 now it's the columns that sum to 1 and we learn for example that the proportion of bad characters that are secret is around 63% as the number of cells in these tables gets large it becomes much easier to make sense of your data using graphics the bar chart is still a good choice but we're going to need to add some options here's the code for the bar chart based on comps we want to condition on whatever's on the x-axis and then stretch those bars to each add up to a total proportion of 1 so we add the position equals fill option to the Giambi our function let's add one additional layer it changed to our y-axis to indicate that we're looking at proportions when we run this code at the console we get a plot that reflects our table of proportions after we had conditioned on ID while the proportion of secret characters that are bad is still large it actually it's actually less than the proportion of bad care ders and those that are listed as unknown we get a very different picture if we condition instead on alignment the only change needed in the code is to swap the positions of the names of the variables this results in a plot where we've conditioned on alignment and we learned that within characters that are bad the greatest proportion of those are indeed secret this might seem paradoxical but it's just a result of having different numbers of cases in each level okay now you try experimenting with conditional proportions\n"