Python Tutorial - Transforming DataFrames

Using Vectorized Computations with Pandas DataFrames and Series

When we have selected or filtered our data in pandas DataFrames, we often want to transform it somehow. The best way to do this is by using methods inherent to DataFrames or by applying NumPy functions to entire columns of data element-wise. In this article, we will explore how to use vectorized computations with pandas DataFrames and Series.

Transforming Data with Built-in Methods

Suppose we want to convert sales numbers into units of whole dozens rounded down rather than individual item counts. The most efficient way to do this is to use a pandas built-in method like `round()`. Notice that this arithmetic operation is applied to every entry in the data frame without writing any loops. This is because `round()` uses vectorized or element-wise computation, repeating the same computation over an entire data structure without writing any loops.

Another way to achieve this result is by using NumPy's `floor` and division functions. Both of these strategies use vectorized or element-wise computation to repeat the same computation over an entire data structure without writing any loops. However, if pandas' built-in methods like `round()` and NumPy's `floor` and division were not available, we could make a custom function to do this here. We call it "dozens". The DataFrame `apply` method is used to execute that function with each entry of the data frame again without writing any loops.

Alternatively, we can use a lambda function with the `apply` method to achieve the same result. The lambda keyword followed by the input argument and the output expression provides a convenient one-line definition of a throwaway function. All of the preceding computations returned a new data frame without altering the original data frame `DF`. To preserve a computed result, we can create a new column storing calculations for instance.

Both `apply` and vectorize methods work on Series as well as on entire DataFrames. Moreover, filters and indices often provide sub-Series or data Frames for transformation. Having worked with vectorized computation of a numerical Series, let's look next at string operations.

String Operations

Notice that the DataFrame's dot index attribute is itself a special kind of series containing strings. In this instance, DataFrames, Series, and Index objects all come with a handy `STR` attribute as a kind of accessor for vectorized string transformations. Here we assign the index `DF.index` to make it all uppercase again. Notice that we did not loop explicitly over the entire index instead we applied a vectorized string method to transform the entire index element-wise.

For the index there is no applied method, but unfortunately, different terminology thus we can apply say `STR` dot lower or a custom transformation using the `map` method. Many arithmetic operators for instance the plus sign work with DataFrames in Series directly. Thus here we create a new column "salty eggs" by adding the "salt" and "eggs" columns together if we can express the calculation using pandas alone that's always preferable to using loops.

Exercises Using Vectorized Computations

Take some time now to work through exercises using vectorized computations. This will help you understand how to use these powerful methods in your own data analysis projects. By mastering vectorized computations with pandas DataFrames and Series, you can write more efficient and productive code for a wide range of data manipulation tasks.

"WEBVTTKind: captionsLanguage: enonce we've selected or filtered our data we often want to transform it somehow the best way to transform data in pandas dataframes is with methods inherent to data frames next best is using numpy you funks or universal functions to transform entire columns of data element wise let's have a look how this works suppose we want to convert sales numbers into units of whole dozens rounded down rather than individual item counts the most efficient way to do this is to use a pandas built-in method like Florida notice this arithmetic operation is applied to every entry in the data frame without writing any loops another way to do this uses num pies floor divided function both of these strategies use vectorized or element wise computation to repeat the same computation over an entire data structure without writing any loops if pandas floor dev and num pies floor divided were not available we could make a custom function to do this here we call it dozens the data frame apply method called here using dozens execute that function with each entry of the data frame again without writing any loops yet another way to achieve the same result is to use a lambda function with the apply method the lambda keyword followed by the input argument a colon and the output expression provides a convenient one-line definition of a throwaway function all of the preceding computations returned a new data frame without altering the original data frame DF to preserve a computed result we can create a new column storing calculations for instance here we create a new dozens of eggs column in which the Florida 12 method is applied to the series DF eggs both apply and vectorize methods work on series as well as on entire data frames moreover filters and indices often provide sub series or data frames for transformation having worked with vectorize computation of a numerical series let's look next at string operations notice the DF dot index attribute is itself a special kind of series containing strings in this instance data frames series and index objects all come with a handy STR attribute as a kind of accessor for vectorized string transformations here we assign the index DF index STR upper to make the index all uppercase again notice we did not loop explicitly over the entire index instead we applied a vectorized string method to transform the entire index element wise for the index there is no applied method for the index the relevant method is called map and unfortunately different terminology thus we can apply say STR dot lower or a custom transformation to the index element wise using the map method instead many arithmetic operators for instance the plus sign work with data frames in series directly thus here we create a new column salty eggs by adding the salt and eggs columns together if we can express the calculation using pandas alone that's always preferable to using loops that's an awful lot to absorb take some time now to work through the exercises using vectorized computationsonce we've selected or filtered our data we often want to transform it somehow the best way to transform data in pandas dataframes is with methods inherent to data frames next best is using numpy you funks or universal functions to transform entire columns of data element wise let's have a look how this works suppose we want to convert sales numbers into units of whole dozens rounded down rather than individual item counts the most efficient way to do this is to use a pandas built-in method like Florida notice this arithmetic operation is applied to every entry in the data frame without writing any loops another way to do this uses num pies floor divided function both of these strategies use vectorized or element wise computation to repeat the same computation over an entire data structure without writing any loops if pandas floor dev and num pies floor divided were not available we could make a custom function to do this here we call it dozens the data frame apply method called here using dozens execute that function with each entry of the data frame again without writing any loops yet another way to achieve the same result is to use a lambda function with the apply method the lambda keyword followed by the input argument a colon and the output expression provides a convenient one-line definition of a throwaway function all of the preceding computations returned a new data frame without altering the original data frame DF to preserve a computed result we can create a new column storing calculations for instance here we create a new dozens of eggs column in which the Florida 12 method is applied to the series DF eggs both apply and vectorize methods work on series as well as on entire data frames moreover filters and indices often provide sub series or data frames for transformation having worked with vectorize computation of a numerical series let's look next at string operations notice the DF dot index attribute is itself a special kind of series containing strings in this instance data frames series and index objects all come with a handy STR attribute as a kind of accessor for vectorized string transformations here we assign the index DF index STR upper to make the index all uppercase again notice we did not loop explicitly over the entire index instead we applied a vectorized string method to transform the entire index element wise for the index there is no applied method for the index the relevant method is called map and unfortunately different terminology thus we can apply say STR dot lower or a custom transformation to the index element wise using the map method instead many arithmetic operators for instance the plus sign work with data frames in series directly thus here we create a new column salty eggs by adding the salt and eggs columns together if we can express the calculation using pandas alone that's always preferable to using loops that's an awful lot to absorb take some time now to work through the exercises using vectorized computations\n"