Python Tutorial - Arithmetic with Series & DataFrames

Exploring Arithmetic and Mathematical Operations with Pandas Series and DataFrames

We can load daily weather measurements for Pittsburgh from 2013, making date the index, and using par States equals true to get date/time objects with date/time indexes. We can use convenient strings to slice, such as the first week of July from the precipitation in column. The precipitation data are in inches. Let's convert them to centimeters by multiplying a series element-wise by 2.5.

We can perform standard scalar mathematical operations between pandas series and data frames. Broadcasting means that multiplication is applied to all elements in the data frame. We can find the percentage variation in temperature in the first week of July, which is the daily minimum and the daily maximum temperatures expressed as a percentage of the daily mean friend temperature. We can compute this by dividing both the min temperature F and the max temperature F columns by the mean temperature F column and multiplying both by 100.

Let's start by slicing the min temperature F and the max temperature F columns as a data frame week one range, then slice the mean temperature F column as a series week one mean. However, when we try to divide the data frame week one range by the series week one mean, it doesn't quite work because the column labels don't match, resulting in all null values. To fix this, we can use the data frame divide method with the option access equals rows. The divide method provides more fine-grained control than the slash operator for division itself and broadcasts the series week 1 mean across each row to produce the desired ratios.

We can see that the temperature range varies by about 10% from the mean in that week. A related computation is to compute a percentage change along a time series. We do this by subtracting the previous day's value from the current day's value and if by the previous day's value, the percent change method does precisely this computation for us. Here, we also multiply the resulting series by 100 to yield a percentage value. Notice that the value in the first row is NaN because there is no previous entry.

Now, let's examine how arithmetic operations work between distinct series or data frames with non-aligned indexes, which happens often in practice. We'll use Olympic medal data from 1896 to 2008 and list the top five bronze medal winning countries, the top five silver medal winning countries, and the top five gold medal winning countries for that period. All three data frames have the same indices for the first three rows, but the next two rows are either France, Germany, or Italy.

We can compute the total medals awarded to each country by adding bronze and silver. Adding two series of five rows gives us back a series with six rows, and the index of the sum is the union of the row indices from the original two series. Arithmetic operation between pandas series are carried out for four rows with common index values since Germany does not appear in silver and Italy does not appear in bronze. Those rows have NaN in the sum on examination.

We can see that the value 2247 for the United States is the sum of 1052 and 1195 from the corresponding rows of the bronze and silver series respectively. We can get the same sum by using method invocation with bronze dot add silver. The null values occur in the same places, and when rows fail to align, we can modify this behavior using the fill value option of the add method by specifying fill value equals 0.

The values of Germany and Italy are no longer null just as the divide method is more flexible than the slash operator for division. The add method is also more flexible than the plus operator because it allows us to adding all three series together, yielding six rows of output but only three have non-null values: France, Germany, and Italy. These rows are not indexed labels in all three series so each of those rows is NaN in the sum.

We can change calls to the dot add method with fill value equals zero to get rid of those null values in the triple sum now that you can get some experience with standard arithmetic operations and methods for series and data frames.