How to get started with Pandas for Data Science

**Creating a Pandas DataFrame from an Array**

In this tutorial, we will explore how to create a pandas DataFrame from an array using various methods. We will start by assigning the array to a variable and then use the `pd.DataFrame` function to create a DataFrame.

```python

import pandas as pd

import numpy as np

# Create an array

n1 = np.array([0, 1, 2])

# Assign the array to a variable and create a DataFrame

df1 = pd.DataFrame(n1)

print(df1)

```

**Displaying the Contents of the DataFrame**

Once we have created the DataFrame, we can display its contents by typing out the name of the variable.

```python

print(df1)

```

This will output:

```

0

0 0

1 1

2 2

```

**Creating a DataFrame with Meaningful Column Names**

We can add meaningful column names to our DataFrame using the `name` function. Let's say we want to create a new variable named 'a'. We can do this by specifying the `name` function when creating the DataFrame.

```python

# Create an array

n1 = np.array([0, 1, 2])

# Assign the array to a variable and create a DataFrame with meaningful column names

df2 = pd.DataFrame(n1, columns=['a'])

print(df2)

```

This will output:

```

a

0 0

1 1

2 2

```

We can also delete the column name by commenting out the `columns` function.

```python

# Create an array

n1 = np.array([0, 1, 2])

# Assign the array to a variable and create a DataFrame

df3 = pd.DataFrame(n1)

print(df3)

```

This will output:

```

0

0 0

1 1

2 2

```

**Creating a DataFrame with Meaningful Row Names**

We can also add meaningful row names to our DataFrame using the `name` function. Let's say we want to create new variables named 'r1', 'r2', and 'r3'. We can do this by specifying the `name` function when creating the DataFrame.

```python

# Create an array

n1 = np.array([0, 1, 2])

# Assign the array to a variable and create a DataFrame with meaningful row names

df4 = pd.DataFrame(n1, index=['r1', 'r2', 'r3'])

print(df4)

```

This will output:

```

0

r1 0

r2 1

r3 2

```

We can also add the row names retrospectively by using the `index` function.

```python

# Create an array

n1 = np.array([0, 1, 2])

# Assign the array to a variable and create a DataFrame

df5 = pd.DataFrame(n1)

# Add meaningful row names

df5.index = ['r1', 'r2', 'r3']

print(df5)

```

This will output:

```

0

r1 0

r2 1

r3 2

```

**Creating a DataFrame from a Dictionary**

We can also create a pandas DataFrame from a dictionary. The dictionary should have the column names as keys and the data values as values.

```python

# Create a dictionary

d = {'a': [0, 1, 2], 'b': [3, 4, 5]}

# Assign the dictionary to a variable and create a DataFrame

df6 = pd.DataFrame(d)

print(df6)

```

This will output:

```

a b

0 0 3

1 1 4

2 2 5

```

We can also add meaningful column names to the dictionary.

```python

# Create a dictionary

d = {'a': [0, 1, 2], 'b': [3, 4, 5]}

# Assign the dictionary to a variable and create a DataFrame with meaningful column names

df7 = pd.DataFrame(d, columns=['x', 'y'])

print(df7)

```

This will output:

```

x y

0 0 3

1 1 4

2 2 5

```

We can also add meaningful row names to the dictionary.

```python

# Create a dictionary

d = {'a': [0, 1, 2], 'b': [3, 4, 5]}

# Assign the dictionary to a variable and create a DataFrame with meaningful column names and row names

df8 = pd.DataFrame(d, index=['r1', 'r2', 'r3'])

print(df8)

```

This will output:

```

x y

r1 0 3

r2 1 4

r3 2 5

```

"WEBVTTKind: captionsLanguage: enin this video i'm going to show you how you could use the pandas library to perform data wrangling and data processing on tabular datasets in python let me know in the comments section whether you find this pandas tutorial video helpful so maybe i'll expand this into a multi-part series and so without further ado we're starting right now all right so let's load up this particular jupyter notebook or collab notebook which i have already provided the links to in the video description and so this is the data wrangling with panda so you'll be learning how you could use pandas library in python in order to data wrangle or perform data processing of your data sets so let's get started so the first thing here is you want to import the pandas library and so at default you could use import pandas as pd which is the common way to import the pandas library and some of the functionalities that we'll be using in this particular video is also making use of the numpy library so we're going to import numpy as np as well so let's have a look at the pandas data structure before we go further so pandas allows us to work with tabular datasets so tabular dataset is a table data table so you have columns and then you have roles so let's have a look at the pandas data structure here so i have color coded the various elements of the pandas data structure into the three colors that you can see here so at a high level you'll see the pandas data frame explained here and then the panel series explained here at the bottom part so the panel's data frame is comprised of three components the row names the column names and also the values so i have already dissected it here to the left part is the row name so you take only the blue color out and then the column names is the first pink color here it has been taken out so that's the column names and then only the yellow portion is the data values so if you combine all of that you get the row names and also the column names along with the data values so collectively they are known as the pandas data frame and so the row names and the column names are the index which is the third type of data structure of pandas and let's hop on to the panda series so each row or each column will be a panda series so you can see here that panda series is one dimension it could be the row or it could be the column and the pandas data frame is two dimensional right it has the rows and it has the columns and if you take the rows from the pandas data frame you get this panda series and so the index will be the row names and if you take the panda series from the column then the index will be the column name okay so i've provided more information on the descriptive properties of the panda series pandas data frame and the index here all right and so let's start by creating our first few pandas objects so panda series so i have already commented here to various type of panda series that you could create and so let's uncomment it so that we could see the resulting outputs so creating a pattern series is as simple as using pd dot series and then the opening and the closing parenthesis and then as input we'll use a list of values okay so we have it import right here so we'll run this too so if you're using it locally you want to pip install panels as well but if you're using collab it comes pre-installed let's run this again all right and therefore you have a panda series so let me type from scratch is pd dot series opening and closing parenthesis and then you want to put in your list of values so a python list like this and then shift enter or hit on the play button and then you get the list here right so your list could also be strings it could be a list of screens like red blue green so that could also be a list and then and then you convert it into a panda series using the pd.series function okay so i'll delete this let's proceed further so now let's create a panes data frame all right and so in the above here you have already seen that the index values are integers right here this is the index value 0 1 two three four these are the index values so they are integers but then we'll provide you with some more information in subsequent code cell how you could also add labels to it so stay tuned for that in just a moment all right and so let's create a pandas data frame from a numpy array so here we're going to create a numpy array which is the first line here and then we're going to assign it to the n1 variable and then we're going to create a data frame by using pd.dataframe and then as input value we're going to use n1 and then finally we're going to display it by typing out df1 so that we see the contents of the data frame there you go so this is the data frame you can see that it is a tabular table here so let me type in n1 the array so this is the array and then we've converted it into a panel's data frame by using pd.dataframe so as you can see here creating a pandas data frame is as simple as using the pd.dataframe function on the array that we have created pandas index okay so i mentioned earlier on that we could add meaningful labels to the columns so let's do that so here panda series so if we're going to add the name here let's run it so you can see that aside from just saying pd.series and then l2 which is the list too we also specify using the name function name equals to a and so you can see here that the name became a so let's run it without specifying the name let's delete that let's see what happens you see that it also runs and create the panda series but then there's no name here right only the data type all right and so let's create the pandas dataframe with the names so firstly we're going to create an into array using the np.array function as we have already done previously and then we're going to assign it to the pd.dataframe function to create our data frame and in addition to that we're going to name it we're going to create the columns name let me comment that out first let me show you there you go you have abc so without the column names it would be integers 0 1 2. but with the column names we could add meaningful labels to it so let me comment this out and then i'll run df3 here and now you could also name the rows as well right index here are the rows why don't we say r1 for row one r2 for row two r3 row three okay there you go all right and so what if we wanted to add the names here so similar to r1r2 r3 but it's just copying the contents from the prior data frame and then we're using names of individuals and then we're adding it to the index so this is doing it in retrospect so you can see that we're using df4 dot index and then we're assigning a list of names to it and then we have already added the list so what we've done here is we've copied the contents from df3 and then we're adding it at a later time the index name or the row names and so we've done it using the index function okay and aside from using numpy arrays as input for creating the data frame we could also create data frames from a python dictionary so let's say that we have a dictionary specify using the d variable here and then we have a and then we have a list of values and then we have b and then we have a list of values and then c and then the list of values and then we're using this d variable as input for the data frame and then let's have a look and now we have a data frame right so a here is the column a b here is the column b c here is the column c and so therefore here is essentially the same as using the index function and also oh what it meant was is the same as using the columns function here so specifying it like this as a dictionary is essentially the same as running this df2 function where you specify the name of the numpy array and then you specify the name of the columns and so as already mentioned you could conveniently create a pandas data frame using dictionary as an input and so the column names are specified here and then the data values are here in the list and we could add the index to it also or we could also add it retrospect df1.index let's run it you could either do it like this or we could do like this df1 index equals and then we add list like that right so there's more than one way of doing it using a dictionary using a list using a numpy array so it's totally up to you so choose a way that resonates with you and do try it out and let me know how it goes thank you so much for watching until the end of the video and please drop a fire emoji in the comment section if you wish this far and support the channel by liking the video subscribing if you haven't already and also make sure to hit on notifications for updates on the latest release of the future video and as always the best way to learn data science is to do data science and please enjoy the journeyin this video i'm going to show you how you could use the pandas library to perform data wrangling and data processing on tabular datasets in python let me know in the comments section whether you find this pandas tutorial video helpful so maybe i'll expand this into a multi-part series and so without further ado we're starting right now all right so let's load up this particular jupyter notebook or collab notebook which i have already provided the links to in the video description and so this is the data wrangling with panda so you'll be learning how you could use pandas library in python in order to data wrangle or perform data processing of your data sets so let's get started so the first thing here is you want to import the pandas library and so at default you could use import pandas as pd which is the common way to import the pandas library and some of the functionalities that we'll be using in this particular video is also making use of the numpy library so we're going to import numpy as np as well so let's have a look at the pandas data structure before we go further so pandas allows us to work with tabular datasets so tabular dataset is a table data table so you have columns and then you have roles so let's have a look at the pandas data structure here so i have color coded the various elements of the pandas data structure into the three colors that you can see here so at a high level you'll see the pandas data frame explained here and then the panel series explained here at the bottom part so the panel's data frame is comprised of three components the row names the column names and also the values so i have already dissected it here to the left part is the row name so you take only the blue color out and then the column names is the first pink color here it has been taken out so that's the column names and then only the yellow portion is the data values so if you combine all of that you get the row names and also the column names along with the data values so collectively they are known as the pandas data frame and so the row names and the column names are the index which is the third type of data structure of pandas and let's hop on to the panda series so each row or each column will be a panda series so you can see here that panda series is one dimension it could be the row or it could be the column and the pandas data frame is two dimensional right it has the rows and it has the columns and if you take the rows from the pandas data frame you get this panda series and so the index will be the row names and if you take the panda series from the column then the index will be the column name okay so i've provided more information on the descriptive properties of the panda series pandas data frame and the index here all right and so let's start by creating our first few pandas objects so panda series so i have already commented here to various type of panda series that you could create and so let's uncomment it so that we could see the resulting outputs so creating a pattern series is as simple as using pd dot series and then the opening and the closing parenthesis and then as input we'll use a list of values okay so we have it import right here so we'll run this too so if you're using it locally you want to pip install panels as well but if you're using collab it comes pre-installed let's run this again all right and therefore you have a panda series so let me type from scratch is pd dot series opening and closing parenthesis and then you want to put in your list of values so a python list like this and then shift enter or hit on the play button and then you get the list here right so your list could also be strings it could be a list of screens like red blue green so that could also be a list and then and then you convert it into a panda series using the pd.series function okay so i'll delete this let's proceed further so now let's create a panes data frame all right and so in the above here you have already seen that the index values are integers right here this is the index value 0 1 two three four these are the index values so they are integers but then we'll provide you with some more information in subsequent code cell how you could also add labels to it so stay tuned for that in just a moment all right and so let's create a pandas data frame from a numpy array so here we're going to create a numpy array which is the first line here and then we're going to assign it to the n1 variable and then we're going to create a data frame by using pd.dataframe and then as input value we're going to use n1 and then finally we're going to display it by typing out df1 so that we see the contents of the data frame there you go so this is the data frame you can see that it is a tabular table here so let me type in n1 the array so this is the array and then we've converted it into a panel's data frame by using pd.dataframe so as you can see here creating a pandas data frame is as simple as using the pd.dataframe function on the array that we have created pandas index okay so i mentioned earlier on that we could add meaningful labels to the columns so let's do that so here panda series so if we're going to add the name here let's run it so you can see that aside from just saying pd.series and then l2 which is the list too we also specify using the name function name equals to a and so you can see here that the name became a so let's run it without specifying the name let's delete that let's see what happens you see that it also runs and create the panda series but then there's no name here right only the data type all right and so let's create the pandas dataframe with the names so firstly we're going to create an into array using the np.array function as we have already done previously and then we're going to assign it to the pd.dataframe function to create our data frame and in addition to that we're going to name it we're going to create the columns name let me comment that out first let me show you there you go you have abc so without the column names it would be integers 0 1 2. but with the column names we could add meaningful labels to it so let me comment this out and then i'll run df3 here and now you could also name the rows as well right index here are the rows why don't we say r1 for row one r2 for row two r3 row three okay there you go all right and so what if we wanted to add the names here so similar to r1r2 r3 but it's just copying the contents from the prior data frame and then we're using names of individuals and then we're adding it to the index so this is doing it in retrospect so you can see that we're using df4 dot index and then we're assigning a list of names to it and then we have already added the list so what we've done here is we've copied the contents from df3 and then we're adding it at a later time the index name or the row names and so we've done it using the index function okay and aside from using numpy arrays as input for creating the data frame we could also create data frames from a python dictionary so let's say that we have a dictionary specify using the d variable here and then we have a and then we have a list of values and then we have b and then we have a list of values and then c and then the list of values and then we're using this d variable as input for the data frame and then let's have a look and now we have a data frame right so a here is the column a b here is the column b c here is the column c and so therefore here is essentially the same as using the index function and also oh what it meant was is the same as using the columns function here so specifying it like this as a dictionary is essentially the same as running this df2 function where you specify the name of the numpy array and then you specify the name of the columns and so as already mentioned you could conveniently create a pandas data frame using dictionary as an input and so the column names are specified here and then the data values are here in the list and we could add the index to it also or we could also add it retrospect df1.index let's run it you could either do it like this or we could do like this df1 index equals and then we add list like that right so there's more than one way of doing it using a dictionary using a list using a numpy array so it's totally up to you so choose a way that resonates with you and do try it out and let me know how it goes thank you so much for watching until the end of the video and please drop a fire emoji in the comment section if you wish this far and support the channel by liking the video subscribing if you haven't already and also make sure to hit on notifications for updates on the latest release of the future video and as always the best way to learn data science is to do data science and please enjoy the journey\n"