Python Tutorial - Small multiples

# Enhancing Data Visualization: Using Small Multiples in Matplotlib

## The Problem of Overcrowded Plots

When working with data visualization, it's easy to fall into the trap of overloading a single plot with too much information. Adding more data to a plot can often make it harder to discern patterns rather than making them clearer. For instance, consider plotting the average precipitation in Seattle throughout the year. If we also include the 25th and 75th percentiles of precipitation as dashed lines above and below the average, the plot becomes informative. However, this simplicity changes when comparing data from another city like Austin.

By adding Austin's data to the same plot, the visualization quickly becomes cluttered. The result is a messy display that obscures the patterns we're trying to observe. This highlights the need for a more organized approach to presenting multiple datasets side by side.

## Introducing Small Multiples

A effective solution to this problem is the use of "small multiples," which are multiple small plots that display similar data across different conditions or categories. For example, instead of combining Seattle and Austin's precipitation data into one plot, we can create separate subplots for each city. This approach makes it easier to compare data while keeping the visualization clean and organized.

In Matplotlib, these small multiples are referred to as "subplots." The function that creates these subplots is also named `subplots`, reflecting its purpose. Previously, this function was used without any inputs, generating a single subplot. However, by providing inputs, we can create multiple subplots arranged in a grid format with rows and columns.

## Creating Subplots in Matplotlib

To create small multiples using Matplotlib, you first need to initialize a figure object. This figure will contain the subplots, which are typically arranged as a grid. For example, if we want three rows and two columns of subplots, we can set this up by specifying these dimensions when creating the figure.

Before any data is added, the figure consists of an array of axes objects. Instead of having just one axis object (`ax`), you now have an array of axis objects with a shape corresponding to the number of rows and columns specified (e.g., 3x2 for three rows and two columns). To add data to each subplot, you must index into this array and call the `plot()` method on the specific element of the array.

There's also a special case when dealing with only one row or column of plots. In such scenarios, the resulting array will be one-dimensional, meaning you only need to provide one index to access the elements within the array.

## Adding Data and Labels to Subplots

Let's take an example where we want to compare rainfall data between two cities: Seattle and Austin. We'll create a figure with two rows and one column of subplots. The first subplot will display Seattle's data, while the second will show Austin's.

1. **Creating the Figure and Axes Array**:

```python

fig, ax = plt.subplots(2, 1)

```

Here, `ax` is a one-dimensional array containing two elements, corresponding to the two subplots.

2. **Adding Data to Subplots**:

- For the first subplot (Seattle):

```python

ax[0].plot(seattle_data)

ax[0].set_title('Seattle Precipitation')

```

- For the second subplot (Austin):

```python

ax[1].plot(austin_data)

ax[1].set_title('Austin Precipitation')

```

3. **Adding Labels**:

Since the subplots are stacked vertically, you should add y-axis labels to both plots for clarity. However, x-axis labels should only be added to the bottom subplot to avoid repetition. This ensures that the visualization remains uncluttered.

## Ensuring Consistent Y-Axis Ranges

One potential issue when creating multiple subplots is that their y-axis ranges might differ due to variations in data scales. To address this, you can initialize the figure and its subplots with the `sharey` parameter set to `True`. This ensures that all subplots share the same y-axis range, making comparisons between datasets more straightforward.

```python

fig, ax = plt.subplots(2, 1, sharey=True)

```

By implementing these steps, you can create clean and organized visualizations that facilitate direct comparisons between different datasets.

## Conclusion

Adding too much data to a single plot can obscure patterns rather than reveal them. Using small multiples in the form of subplots is an effective way to present multiple datasets while maintaining clarity. By arranging plots in a grid format and ensuring consistent axis ranges, you can create visualizations that are both informative and easy to interpret.

Next, practice creating visualizations with small multiples to enhance your data storytelling capabilities.

"WEBVTTKind: captionsLanguage: enin some cases adding more data to a plot can make the plot too busy obscuring patterns rather than revealing them for example let's explore the data we have about weather in Seattle here we plot average precipitation in Seattle during the course of the year but let's say that we are also interested in the range of values we add the 25th percentile and the 75th percentile of the precipitation in dashed lines above and below the average what would happen if we compared this to Austin this code adds the data from Austin to the plot when we display the plot it's a bit of a mess there's too much data in this plot one way to overcome this kind of mess is to use what are called small multiples these are multiple small plots that show similar data across different conditions for example precipitation data across different cities in matplotlib small multiples are called subplots that is also the reason that the function that creates these is called subplots previously we called this function with no inputs this creates one subplot now we'll give it some inputs small multiples are typically arranged on the page as a grid with rows and columns here we are creating a figure object with three rows of subplots and two columns this is what this would look like before we add any data to it in this case the variable ax is no longer only one axis object instead it is an array of axes objects with the shape of three by two to add data we would now have to index into this object and call the plot method on an element of the array there is a special case for situations where you have only one row or only one column of plots in this case the resulting array will be one dimensional and you will only have to provide one index to access the elements of this array for example consider what we might do with the rainfall data that we were plotting before we create a figure and an array of axes objects with two rows and one column we address the first element in this array which is the top subplot and add the data for Seattle to this plot then we address the second element in the array which is the bottom plot and add the data from Austin to it we can add a y-axis label to each one of these because they are one on top of the other we only add an x-axis label to the bottom plot by addressing only the second element in the array of axes objects when we show this we see that the data are now cleanly presented in a way that facilitates the direct comparison between the two cities one thing we need to take care of is the range of the y-axis in the two plots which is not exactly the same this is because the highest and lowest values in the two datasets are not identical to make sure that all the subplots have the same range of y-axis values we initialize the figure and its subplots with the keyword argument share Y set to true this means that both subplots will have the same range of y-axis values based on the data from both data sets now the comparison across data sets is more straightforward next go ahead and practice creating visualizations with small multiplesin some cases adding more data to a plot can make the plot too busy obscuring patterns rather than revealing them for example let's explore the data we have about weather in Seattle here we plot average precipitation in Seattle during the course of the year but let's say that we are also interested in the range of values we add the 25th percentile and the 75th percentile of the precipitation in dashed lines above and below the average what would happen if we compared this to Austin this code adds the data from Austin to the plot when we display the plot it's a bit of a mess there's too much data in this plot one way to overcome this kind of mess is to use what are called small multiples these are multiple small plots that show similar data across different conditions for example precipitation data across different cities in matplotlib small multiples are called subplots that is also the reason that the function that creates these is called subplots previously we called this function with no inputs this creates one subplot now we'll give it some inputs small multiples are typically arranged on the page as a grid with rows and columns here we are creating a figure object with three rows of subplots and two columns this is what this would look like before we add any data to it in this case the variable ax is no longer only one axis object instead it is an array of axes objects with the shape of three by two to add data we would now have to index into this object and call the plot method on an element of the array there is a special case for situations where you have only one row or only one column of plots in this case the resulting array will be one dimensional and you will only have to provide one index to access the elements of this array for example consider what we might do with the rainfall data that we were plotting before we create a figure and an array of axes objects with two rows and one column we address the first element in this array which is the top subplot and add the data for Seattle to this plot then we address the second element in the array which is the bottom plot and add the data from Austin to it we can add a y-axis label to each one of these because they are one on top of the other we only add an x-axis label to the bottom plot by addressing only the second element in the array of axes objects when we show this we see that the data are now cleanly presented in a way that facilitates the direct comparison between the two cities one thing we need to take care of is the range of the y-axis in the two plots which is not exactly the same this is because the highest and lowest values in the two datasets are not identical to make sure that all the subplots have the same range of y-axis values we initialize the figure and its subplots with the keyword argument share Y set to true this means that both subplots will have the same range of y-axis values based on the data from both data sets now the comparison across data sets is more straightforward next go ahead and practice creating visualizations with small multiples\n"