Matplotlib Plot Tutorial - Histograms, Scatter Plots & Legend

**Visualizing Data with Python**

Python is a powerful language that can be used to create a wide range of visualizations, from simple plots to complex charts. In this article, we will explore how to use Python's matplotlib library to create histograms and line plots.

**Histograms**

One of the most useful visualizations in data analysis is the histogram. A histogram is a graphical representation of the distribution of data. It is used to show the frequency of different values in a dataset. To create a histogram in Python, we can use the hist function from the matplotlib library. The first two arguments to this function are the values we want to build a histogram for (X) and the number of bins we want to divide the data into (bins).

For example, let's say we want to create a histogram of the 12 values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12. We can use the hist function like this:

```

import matplotlib.pyplot as plt

X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]

bins = 'treat'

plt.hist(X, bins=bins)

plt.show()

```

This will create a histogram with the specified number of bins. We can adjust the number of bins to see how it affects the shape of the histogram.

**Population Pyramids**

Histograms are not only useful for showing the distribution of data, but also for visualizing demographic trends over time. Population pyramids are a classic example of this. These pyramids show the age distribution of a population over time. The x-axis represents the age group, and the y-axis represents the number of people in each group.

To create a population pyramid, we can use the hist function again, but this time with a twist. We will divide the data into horizontal bins instead of vertical bins. This will give us a pyramid shape that is characteristic of population pyramids.

```

import matplotlib.pyplot as plt

X = [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]

bins = 'treat'

plt.hist(X, bins=bins)

plt.show()

```

This will create a population pyramid with the specified age groups.

**Line Plots**

Line plots are another useful visualization that can be used to show trends over time. They are similar to histograms in that they represent data on a graph, but instead of showing the distribution of values, they show how values change over time.

To create a line plot in Python, we can use the plot function from the matplotlib library. We will need two lists: one for the x-values (the years) and one for the y-values (the population).

For example, let's say we want to create a line plot of the world population from 1950 to 2100. We can use the plot function like this:

```

import matplotlib.pyplot as plt

X = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020, 2030, 2040, 2050, 2060, 2070]

Y = [2.5, 3.5, 4.1, 4.9, 5.7, 6.1, 6.8, 7.3, 7.8, 8.3, 8.8, 9.2, 9.5]

plt.plot(X, Y)

plt.show()

```

This will create a line plot with the specified years on the x-axis and population values on the y-axis.

**Customizing Plots**

One of the most powerful things about matplotlib is its ability to customize plots. We can change the color, shape, labels, and more to suit our needs.

For example, let's say we want to add a title to our line plot. We can use the title function from matplotlib to do this.

```

import matplotlib.pyplot as plt

X = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020, 2030, 2040, 2050, 2060, 2070]

Y = [2.5, 3.5, 4.1, 4.9, 5.7, 6.1, 6.8, 7.3, 7.8, 8.3, 8.8, 9.2, 9.5]

plt.plot(X, Y)

plt.title('World Population Projections')

plt.show()

```

This will add a title to our plot.

We can also customize the colors and shapes of plots using various options from matplotlib. For example, we can use the 'o' option to create scatter plots or the '-' option to create line plots with specific styles.

```

import matplotlib.pyplot as plt

X = [1950, 1960, 1970, 1980, 1990, 2000, 2010, 2020, 2030, 2040, 2050, 2060, 2070]

Y = [2.5, 3.5, 4.1, 4.9, 5.7, 6.1, 6.8, 7.3, 7.8, 8.3, 8.8, 9.2, 9.5]

plt.plot(X, Y, 'o-')

plt.show()

```

This will create a line plot with circle markers and solid lines.

**Conclusion**

In this article, we explored how to use Python's matplotlib library to create histograms and line plots. We learned how to customize these plots using various options from matplotlib, including colors, shapes, labels, and more. We also saw examples of population pyramids, which are a classic application of these visualizations. With practice, you can become proficient in creating professional-looking visualizations with Python and matplotlib.

"WEBVTTKind: captionsLanguage: enhi my name is Philip and I'm a data scientist at data camp and this intermediate Python course will further enhance your Python skills specifically for data science you will learn how to visualize data and to store data and new data structures along the way you will master control structures which you will need to customize the flow of your scripts and algorithms well finish this chapter with a case study where they'll blend together everything you've learned to solve a cool problem this first chapter is about data visualization which is a very important part of data analysis first of all you will use it continuously to explore your data sets the better you understand your data the better you'll be able to extract insights and once you found those insights again you'll need visualization to be able to share your precious insights with other people as an example have a look at this beautiful plot it's made by the Swedish professor hands rustling it's talks about global developments have been viewed millions of times what makes them so intriguing is that by making beautiful plots he allows the data to tell their own story here we see a bubble chart or each bubble represents a country the bigger the bubble the bigger the country's population so the two biggest bubbles here are China and India there are two axis horizontal axis shows the GDP per capita in u.s. dollars the vertical axis shows life expectancy we clearly see that people live longer in countries with a higher GDP per capita still there's huge difference in life expectancy between countries on the same income level now why did I tell you all of this well because by the end of this chapter you'll be able to build this beautiful plot yourself there are many visualization packages in Python but the model of them all is a matte pot lip you will need a sub package pie plot by convention the sub package is imported SP LT like this for our first example let's try to gain some insights in the evolution of the road population I have a list with years here year and list with corresponding populations Express 10 billions pop in the year 1970 for example 3.7 billion people lived on planet Earth the plot is data as a line chart recall PLT dot plot and user two lists as argument the first argument corresponds to horizontal axis and the second one to the vertical axis you might think that a plot will pop up right now but button is pretty lazy it will wait for the show function to actually display the plot this is because you might want to add some extra ingredients to your plot before actually displaying it such as titles and label customizations I'll talk about that some more later on just remember this the plot function despite the water plot and how to plot it show actually displays a plot when we look at our plot we see that the years are indeed shown on horizontal axis and a population on the vertical axis there are four data points and Python draws a line between them in 1950 the world population was run 2.5 billion in 2010 it was 7 billion so the world population has almost tripled in 60 years that's pretty scary what is population keeps on growing like that will the world become overpopulated you'll find out in the exercises let me first introduce you to another type of plot the scatter plot to create it we can start from the code from before this time though you can change the plot function to scatter resulting scatter plot simply plots all the individual data points Python doesn't connect the dots with a line for many applications the scatter plot is often a better choice than the line plot so remember the scatter function well you could also say that this is a more almost way of plotting your data because you can clearly see that the plot is based on just four data points the histogram is a type of visualization that's very useful to explore your data it can help you to get an idea about the distribution of your variables to see how it works imagine 12 values between 0 and 6 I've put them along a number line here to build a histogram for these values we can divide the line into equal chunks called bins suppose you go for three bins that each have a width of two next you count how many data points sit inside each bin there's four data points in the first bin 6 in the second bin and doing the third bin finally you draw a bar for each bin the height of the bar corresponds to the number of data points that fall in this bin is a histogram which gives us a nice overview on how the 12 values are distributed most values are in the middle but there are more values below to the nature are above four of course also matplotlib is able to build histograms as before you should start by importing the pipe lock package that's inside math clip next you can use the hist faction lets open up its documentation there's a bunch of arguments you can specify but the first two here are the most important ones X should be a list of values you want to build a histogram for you can use the second arguments bins to tell Python and how many bins the data should be divided based on this number hist will automatically find appropriate boundaries for all bins and calculate how many values are in each one if you don't specify the bins argument for b10 by default so to generate the histogram that you've seen before let's start by building a list with the 12 values next you simply call hist and pass this list as an input so it's matched to the arguments X I also specify the bins arguments to be treat so that the values are divided in three bins if you finally call the show function a nice histogram results histograms are really useful to give a bigger picture as an example have a look at this so-called population pyramids the age distribution is shown for both males and females in the European Union notice that the histograms are flipped 90 degrees the bins are horizontal now the pins are largest for the ages 40 to 44 or there are 20 million meals and 20 million females they are the so-called baby boomers these are figures of the year 2010 what do you think will have changed in 2050 let's have a look the distribution is flatter and the baby boom generation has gotten older with the blink of an eye you can easily see how demographics will be changing over time and that's the true power of histograms at work here creating a plot is one thing making the correct plot that makes the message very clear that's the real challenge for each visualization you have many options first of all there are the different blood types and for each plot you can do an infinite number of customizing you can change colors shapes labels axes and so on the choice depends on one the data and to the star you want to tell with this data since there are so many possible customizations the best way to learn this is by example let's start with the code in this script to build a simple line plots it's similar to the line plot we've created in the first video but this time the year and pop lists contain more data including projections until the year 2100 forecasted by the United Nations if we run the script we already got a pretty nice plot it shows its population explosion that's going on will have slowed down by the end of the century but some things can be improved first it should be clear which data we are displaying especially to people who are seeing the graph for the first time and second the plot really needs to draw the attention to the population explosion the first thing you always need to do is to label your axis let's do this by adding the X label and Y label functions as inputs we passed rings that should be placed alongside t axis make sure to call these functions before calling the show function otherwise your customizations will not be displayed if we run the script again this time the x's are annotated great we're also going to add a title to our plot with the title function we pass the actual title role population projections as an arguments and there's a title so using X label Y label and title we can give the reader more information about the data on the plot now they can at least though what the plot is about to put the population growth in perspective I want to have the y axis start from 0 you can do this with the white text function the first input is a list in this example with the number 0 up to 10 with intervals of 2 if we run this the plot will change the curve shifts up now it's clear that in 1950 there were already about 2.5 billion people on this planet next to make it clear we're talking about billions we can add a second argument to the white text function which is a list with the display names of the ticks this list should have the same length as a first list the take zero gets the name 0 2 take 2 gets name to be to take 4 gets a name for B and so on by the way B stands for billions here if you run this version of the script the labels will change accordingly awesome finally let's add some more historical data to accentuates population explosion in the last 60 years on Wikipedia I found two world population data for the years 1800s 1850 and 1900 I can write them in list form and append them to the pub and year lists with the plus sign if I now run the script once more three data points are added to the graph giving a more complete picture now that's how we turn an average line plot into a visual that has a clear story to tell over to you now youhi my name is Philip and I'm a data scientist at data camp and this intermediate Python course will further enhance your Python skills specifically for data science you will learn how to visualize data and to store data and new data structures along the way you will master control structures which you will need to customize the flow of your scripts and algorithms well finish this chapter with a case study where they'll blend together everything you've learned to solve a cool problem this first chapter is about data visualization which is a very important part of data analysis first of all you will use it continuously to explore your data sets the better you understand your data the better you'll be able to extract insights and once you found those insights again you'll need visualization to be able to share your precious insights with other people as an example have a look at this beautiful plot it's made by the Swedish professor hands rustling it's talks about global developments have been viewed millions of times what makes them so intriguing is that by making beautiful plots he allows the data to tell their own story here we see a bubble chart or each bubble represents a country the bigger the bubble the bigger the country's population so the two biggest bubbles here are China and India there are two axis horizontal axis shows the GDP per capita in u.s. dollars the vertical axis shows life expectancy we clearly see that people live longer in countries with a higher GDP per capita still there's huge difference in life expectancy between countries on the same income level now why did I tell you all of this well because by the end of this chapter you'll be able to build this beautiful plot yourself there are many visualization packages in Python but the model of them all is a matte pot lip you will need a sub package pie plot by convention the sub package is imported SP LT like this for our first example let's try to gain some insights in the evolution of the road population I have a list with years here year and list with corresponding populations Express 10 billions pop in the year 1970 for example 3.7 billion people lived on planet Earth the plot is data as a line chart recall PLT dot plot and user two lists as argument the first argument corresponds to horizontal axis and the second one to the vertical axis you might think that a plot will pop up right now but button is pretty lazy it will wait for the show function to actually display the plot this is because you might want to add some extra ingredients to your plot before actually displaying it such as titles and label customizations I'll talk about that some more later on just remember this the plot function despite the water plot and how to plot it show actually displays a plot when we look at our plot we see that the years are indeed shown on horizontal axis and a population on the vertical axis there are four data points and Python draws a line between them in 1950 the world population was run 2.5 billion in 2010 it was 7 billion so the world population has almost tripled in 60 years that's pretty scary what is population keeps on growing like that will the world become overpopulated you'll find out in the exercises let me first introduce you to another type of plot the scatter plot to create it we can start from the code from before this time though you can change the plot function to scatter resulting scatter plot simply plots all the individual data points Python doesn't connect the dots with a line for many applications the scatter plot is often a better choice than the line plot so remember the scatter function well you could also say that this is a more almost way of plotting your data because you can clearly see that the plot is based on just four data points the histogram is a type of visualization that's very useful to explore your data it can help you to get an idea about the distribution of your variables to see how it works imagine 12 values between 0 and 6 I've put them along a number line here to build a histogram for these values we can divide the line into equal chunks called bins suppose you go for three bins that each have a width of two next you count how many data points sit inside each bin there's four data points in the first bin 6 in the second bin and doing the third bin finally you draw a bar for each bin the height of the bar corresponds to the number of data points that fall in this bin is a histogram which gives us a nice overview on how the 12 values are distributed most values are in the middle but there are more values below to the nature are above four of course also matplotlib is able to build histograms as before you should start by importing the pipe lock package that's inside math clip next you can use the hist faction lets open up its documentation there's a bunch of arguments you can specify but the first two here are the most important ones X should be a list of values you want to build a histogram for you can use the second arguments bins to tell Python and how many bins the data should be divided based on this number hist will automatically find appropriate boundaries for all bins and calculate how many values are in each one if you don't specify the bins argument for b10 by default so to generate the histogram that you've seen before let's start by building a list with the 12 values next you simply call hist and pass this list as an input so it's matched to the arguments X I also specify the bins arguments to be treat so that the values are divided in three bins if you finally call the show function a nice histogram results histograms are really useful to give a bigger picture as an example have a look at this so-called population pyramids the age distribution is shown for both males and females in the European Union notice that the histograms are flipped 90 degrees the bins are horizontal now the pins are largest for the ages 40 to 44 or there are 20 million meals and 20 million females they are the so-called baby boomers these are figures of the year 2010 what do you think will have changed in 2050 let's have a look the distribution is flatter and the baby boom generation has gotten older with the blink of an eye you can easily see how demographics will be changing over time and that's the true power of histograms at work here creating a plot is one thing making the correct plot that makes the message very clear that's the real challenge for each visualization you have many options first of all there are the different blood types and for each plot you can do an infinite number of customizing you can change colors shapes labels axes and so on the choice depends on one the data and to the star you want to tell with this data since there are so many possible customizations the best way to learn this is by example let's start with the code in this script to build a simple line plots it's similar to the line plot we've created in the first video but this time the year and pop lists contain more data including projections until the year 2100 forecasted by the United Nations if we run the script we already got a pretty nice plot it shows its population explosion that's going on will have slowed down by the end of the century but some things can be improved first it should be clear which data we are displaying especially to people who are seeing the graph for the first time and second the plot really needs to draw the attention to the population explosion the first thing you always need to do is to label your axis let's do this by adding the X label and Y label functions as inputs we passed rings that should be placed alongside t axis make sure to call these functions before calling the show function otherwise your customizations will not be displayed if we run the script again this time the x's are annotated great we're also going to add a title to our plot with the title function we pass the actual title role population projections as an arguments and there's a title so using X label Y label and title we can give the reader more information about the data on the plot now they can at least though what the plot is about to put the population growth in perspective I want to have the y axis start from 0 you can do this with the white text function the first input is a list in this example with the number 0 up to 10 with intervals of 2 if we run this the plot will change the curve shifts up now it's clear that in 1950 there were already about 2.5 billion people on this planet next to make it clear we're talking about billions we can add a second argument to the white text function which is a list with the display names of the ticks this list should have the same length as a first list the take zero gets the name 0 2 take 2 gets name to be to take 4 gets a name for B and so on by the way B stands for billions here if you run this version of the script the labels will change accordingly awesome finally let's add some more historical data to accentuates population explosion in the last 60 years on Wikipedia I found two world population data for the years 1800s 1850 and 1900 I can write them in list form and append them to the pub and year lists with the plus sign if I now run the script once more three data points are added to the graph giving a more complete picture now that's how we turn an average line plot into a visual that has a clear story to tell over to you now you\n"