Python Tutorial - Working with data types

Understanding Pandas Basics: Types and Column Conversion

Now that we've reviewed some pandas basics, we need to start thinking about other steps we have to take in order to prepare data for modeling. One of these steps is to think about the types that are present in your data set because you'll likely have to transform some of these columns to other types later on. Let's take a deeper look at types as well as how to convert column types in your data set.

Pandas Data Types

Pandas data types are similar to native Python types, but there are a couple of things to be aware of. The most common types you'll be working with are the object and float 64 types. The object type is what pandas uses to refer to a column that consists of string values or is of mixed types. In terms of memory allocation, this refers to the allocation of memory allotted for storing the values, which is simply referred to as "in 64" in pandas terminology. This is equivalent to the Python integer type.

The float 64 type, on the other hand, is equivalent to the float type in Python, which allows for decimal numbers. Another type you might see as you work with data in pandas is the date/time 64 type or the time delta type, which allows for storing dates and times.

Types in a Data Set

Before any pre-processing can begin, you have to understand what types you're dealing with in your data set. Sometimes, you'll start working with a data set that has an incorrect column type, such as a numerical column written out into a CSV as a string, which will prevent numerical operations from working correctly.

Adjusting Column Types

To adjust the type of a column if the type inferred by pandas upon reading in the file is incorrect, let's take a look at how to do this. We have a simple data set with a couple of columns. If we run `df.dtypes`, we'll see that the type for column c is object, indicating that it contains string values or mixed types.

However, if we simply look at this data frame, we can see that these are float values with decimal points. We want to pre-process and model this data, so we need to adjust the column type. Changing the type of a column is very straightforward in pandas. You can change the type using the `astype` method and passing in the type you want to convert it to.

For example, if we want to convert the object type to float 64, we would use the following code:

```python

df['c'] = df['c'].astype(float)

```

This will ensure that only numerical values are stored in column c, allowing for numerical operations later on. It's also good practice to verify that the new column type is representative of the whole column, as the `astype` method can sometimes affect non-numerical data.

The object type, on the other hand, can represent a column that includes both string and numeric types. If you're unsure what type of values are in your column, it's always best to inspect the data frame and verify the contents before making any changes.

"WEBVTTKind: captionsLanguage: ennow that we've reviewed some pandas basics we need to start thinking about other steps we have to take in order to prepare data for modeling one of these steps is to think about the types that are present in your data set because you'll likely have to transform some of these columns to other types later on let's take a deeper look at types as well as how to convert column types in your data set recall that you can check the types of a data frame by using the D types attribute like this pandas data types are similar to native Python types but there are a couple of things to be aware of the most common types you'll be working with are the object in 64 and float 64 types the object type is what pandas uses to refer to a column that consists of string values or is of mixed types in 64 is equivalent to the Python integer type the 64 simply refers to the allocation of memory allotted for storing the values and float 64 is equivalent to the float type another type you might see as you work with data in pandas is the date/time 64 type or the time delta type this is because you can store dates as date/time types and pandas dataframes and even used 8 times as a special kind of index all you need to be familiar with as we work through this course are the object in 64 and float types though before any pre-processing can begin you have to understand what types you're dealing with in your data set sometimes you'll start working with a data set that has an incorrect column type maybe a numerical column was written out into a CSV as a string and when you try to work with that column numerical operations won't work let's take a look at how to adjust the type of a column if the type of pandas has inferred upon reading in the file is incorrect here we have a simple data set with a couple of columns if you run dfd types you'll see that the type for column c is object however if we simply look at this data frame you can see that these are float values numbers with decimal points we want to pre-process and model this data we're going to have to adjust the column type changing the type of a column is very straightforward Panda already has a method for converting the type of a column to a new type you can change the type using the as type method and passing in the type you want to convert it to make sure you're only assigning it to the column you want converted it's also good to be assured as you can that the column type you want to convert is representative of the whole column remember that the object type can represent a column that includes both string and numeric types now it's your turn to do some type conversionnow that we've reviewed some pandas basics we need to start thinking about other steps we have to take in order to prepare data for modeling one of these steps is to think about the types that are present in your data set because you'll likely have to transform some of these columns to other types later on let's take a deeper look at types as well as how to convert column types in your data set recall that you can check the types of a data frame by using the D types attribute like this pandas data types are similar to native Python types but there are a couple of things to be aware of the most common types you'll be working with are the object in 64 and float 64 types the object type is what pandas uses to refer to a column that consists of string values or is of mixed types in 64 is equivalent to the Python integer type the 64 simply refers to the allocation of memory allotted for storing the values and float 64 is equivalent to the float type another type you might see as you work with data in pandas is the date/time 64 type or the time delta type this is because you can store dates as date/time types and pandas dataframes and even used 8 times as a special kind of index all you need to be familiar with as we work through this course are the object in 64 and float types though before any pre-processing can begin you have to understand what types you're dealing with in your data set sometimes you'll start working with a data set that has an incorrect column type maybe a numerical column was written out into a CSV as a string and when you try to work with that column numerical operations won't work let's take a look at how to adjust the type of a column if the type of pandas has inferred upon reading in the file is incorrect here we have a simple data set with a couple of columns if you run dfd types you'll see that the type for column c is object however if we simply look at this data frame you can see that these are float values numbers with decimal points we want to pre-process and model this data we're going to have to adjust the column type changing the type of a column is very straightforward Panda already has a method for converting the type of a column to a new type you can change the type using the as type method and passing in the type you want to convert it to make sure you're only assigning it to the column you want converted it's also good to be assured as you can that the column type you want to convert is representative of the whole column remember that the object type can represent a column that includes both string and numeric types now it's your turn to do some type conversion\n"