Python Tutorial - Arithmetic with Series & DataFrames

Exploring Arithmetic and Mathematical Operations with Pandas Series and DataFrames

We can load daily weather measurements for Pittsburgh from 2013, making date the index, and using par States equals true to get date/time objects with date/time indexes. We can use convenient strings to slice, such as the first week of July from the precipitation in column. The precipitation data are in inches. Let's convert them to centimeters by multiplying a series element-wise by 2.5.

We can perform standard scalar mathematical operations between pandas series and data frames. Broadcasting means that multiplication is applied to all elements in the data frame. We can find the percentage variation in temperature in the first week of July, which is the daily minimum and the daily maximum temperatures expressed as a percentage of the daily mean friend temperature. We can compute this by dividing both the min temperature F and the max temperature F columns by the mean temperature F column and multiplying both by 100.

Let's start by slicing the min temperature F and the max temperature F columns as a data frame week one range, then slice the mean temperature F column as a series week one mean. However, when we try to divide the data frame week one range by the series week one mean, it doesn't quite work because the column labels don't match, resulting in all null values. To fix this, we can use the data frame divide method with the option access equals rows. The divide method provides more fine-grained control than the slash operator for division itself and broadcasts the series week 1 mean across each row to produce the desired ratios.

We can see that the temperature range varies by about 10% from the mean in that week. A related computation is to compute a percentage change along a time series. We do this by subtracting the previous day's value from the current day's value and if by the previous day's value, the percent change method does precisely this computation for us. Here, we also multiply the resulting series by 100 to yield a percentage value. Notice that the value in the first row is NaN because there is no previous entry.

Now, let's examine how arithmetic operations work between distinct series or data frames with non-aligned indexes, which happens often in practice. We'll use Olympic medal data from 1896 to 2008 and list the top five bronze medal winning countries, the top five silver medal winning countries, and the top five gold medal winning countries for that period. All three data frames have the same indices for the first three rows, but the next two rows are either France, Germany, or Italy.

We can compute the total medals awarded to each country by adding bronze and silver. Adding two series of five rows gives us back a series with six rows, and the index of the sum is the union of the row indices from the original two series. Arithmetic operation between pandas series are carried out for four rows with common index values since Germany does not appear in silver and Italy does not appear in bronze. Those rows have NaN in the sum on examination.

We can see that the value 2247 for the United States is the sum of 1052 and 1195 from the corresponding rows of the bronze and silver series respectively. We can get the same sum by using method invocation with bronze dot add silver. The null values occur in the same places, and when rows fail to align, we can modify this behavior using the fill value option of the add method by specifying fill value equals 0.

The values of Germany and Italy are no longer null just as the divide method is more flexible than the slash operator for division. The add method is also more flexible than the plus operator because it allows us to adding all three series together, yielding six rows of output but only three have non-null values: France, Germany, and Italy. These rows are not indexed labels in all three series so each of those rows is NaN in the sum.

We can change calls to the dot add method with fill value equals zero to get rid of those null values in the triple sum now that you can get some experience with standard arithmetic operations and methods for series and data frames.

"WEBVTTKind: captionsLanguage: enlet's explore various arithmetic and mathematical operations between pandas series and data frames we load daily weather measurements for Pittsburgh from 2013 we make date the index and we use par States equals true to get date/time objects with date/time indexes we can use convenient strings to slice say the first week of July from the precipitation in column the precipitation data are in inches let's convert them to centimeters we use the asterisk to multiply a series element-wise by 2.5 for remember we can broadcast standard scalar mathematical operations here broadcasting means the multiplication is applied to all elements in the data frame let's find the percentage variation in temperature in the first week of July that is the daily minimum and the daily maximum temperatures expressed as a percentage of the daily mean friend temperature we can compute this by dividing both the min temperature F and the max temperature F columns by the mean temperature F column and multiplying both by 100 to begin slice the min temperature F and the max temperature F columns as a data frame week one range next slice the mean temperature F column as a series week one mean dividing the data frame week one range by the series week one mean doesn't quite work the column labels don't match so the result has all null values instead we want to use the data frame divide method with the option access equals rows the divide method provides more fine-grained control than the slash operator for division itself this broadcasts the series week 1 mean across each row to produce the desired ratios we can see the temperature range varies by it most about 10% from the mean in that week a related computation is to compute a percentage change along a time series we do this by subtracting the previous day's value from the current day's value and if by the previous day's value the percent change method does precisely this computation for us here we also multiply the resulting series by 100 to yield a percentage value notice the value in the first row is NaN because there is no previous entry finally let's examine how arithmetic operations work between distinct series or data frames with non-aligned indexes which happens often in practice we'll use Olympic medal data from 1896 to 2008 here are the top five bronze medal winning countries the top five silver medal winning countries and the top five gold medal winning countries for that period all three data frames have the same indices for the first three rows United States Soviet Union and United Kingdom by contrast the next two rows are either France Germany or Italy let's compute the total medals awarded to each country we start by adding bronze and silver here we add two series of five rows and get back a series with six rows the index of the sum is the union of the row indices from the original two series arithmetic operation between pandas series are carried out four rows with common index values since Germany does not appear in silver and Italy does not appear in bronze those rows have nan in the sum on examination we see the value 2247 for the United States Row is the sum of 1052 and 1195 from the corresponding rows of the bronze and silver series respectively we can get the same sum bronze plus silver with a method invocation using bronze dot add silver the null values occur in the same places the default fill value is nan when summoned rows fail to align we can modify this behavior using the fill value option of the add method by specifying fill value equals 0 the values of Germany and Italy are no longer null just as the divide method is more flexible than the slash operator for division the add method is more flexible than the plus operator for it adding all three series together yield six rows of output but only three have non null values that is France Germany and Italy are not indexed labels in all three series so each of those rows is NaN in the sum we can also change calls to the dot add method with fill value equals zero to get rid of those null values in the triple sum now you can get some experience with standard arithmetic operations and methods for series and data frames in the exerciseslet's explore various arithmetic and mathematical operations between pandas series and data frames we load daily weather measurements for Pittsburgh from 2013 we make date the index and we use par States equals true to get date/time objects with date/time indexes we can use convenient strings to slice say the first week of July from the precipitation in column the precipitation data are in inches let's convert them to centimeters we use the asterisk to multiply a series element-wise by 2.5 for remember we can broadcast standard scalar mathematical operations here broadcasting means the multiplication is applied to all elements in the data frame let's find the percentage variation in temperature in the first week of July that is the daily minimum and the daily maximum temperatures expressed as a percentage of the daily mean friend temperature we can compute this by dividing both the min temperature F and the max temperature F columns by the mean temperature F column and multiplying both by 100 to begin slice the min temperature F and the max temperature F columns as a data frame week one range next slice the mean temperature F column as a series week one mean dividing the data frame week one range by the series week one mean doesn't quite work the column labels don't match so the result has all null values instead we want to use the data frame divide method with the option access equals rows the divide method provides more fine-grained control than the slash operator for division itself this broadcasts the series week 1 mean across each row to produce the desired ratios we can see the temperature range varies by it most about 10% from the mean in that week a related computation is to compute a percentage change along a time series we do this by subtracting the previous day's value from the current day's value and if by the previous day's value the percent change method does precisely this computation for us here we also multiply the resulting series by 100 to yield a percentage value notice the value in the first row is NaN because there is no previous entry finally let's examine how arithmetic operations work between distinct series or data frames with non-aligned indexes which happens often in practice we'll use Olympic medal data from 1896 to 2008 here are the top five bronze medal winning countries the top five silver medal winning countries and the top five gold medal winning countries for that period all three data frames have the same indices for the first three rows United States Soviet Union and United Kingdom by contrast the next two rows are either France Germany or Italy let's compute the total medals awarded to each country we start by adding bronze and silver here we add two series of five rows and get back a series with six rows the index of the sum is the union of the row indices from the original two series arithmetic operation between pandas series are carried out four rows with common index values since Germany does not appear in silver and Italy does not appear in bronze those rows have nan in the sum on examination we see the value 2247 for the United States Row is the sum of 1052 and 1195 from the corresponding rows of the bronze and silver series respectively we can get the same sum bronze plus silver with a method invocation using bronze dot add silver the null values occur in the same places the default fill value is nan when summoned rows fail to align we can modify this behavior using the fill value option of the add method by specifying fill value equals 0 the values of Germany and Italy are no longer null just as the divide method is more flexible than the slash operator for division the add method is more flexible than the plus operator for it adding all three series together yield six rows of output but only three have non null values that is France Germany and Italy are not indexed labels in all three series so each of those rows is NaN in the sum we can also change calls to the dot add method with fill value equals zero to get rid of those null values in the triple sum now you can get some experience with standard arithmetic operations and methods for series and data frames in the exercises\n"

Python Tutorial - Arithmetic with Series & DataFrames

Random Videos