Pandas Profiling for Data Science (Quick and Easy Exploratory Data Analysis)

Welcome to the Data Professor YouTube Channel

If you're new here, my name is Shannon Nontox and Hammad, and I'm an associate professor of bioinformatics on this YouTube channel. We cover about data science concepts and practical tutorials. So, if you're into this kind of content, please consider subscribing.

Using Pandas Profiling for Data Analysis

So, in this video, I'm going to give a short tutorial on how to use pandas profiling in order to do expert Ori data analysis. Without further ado, let's get started. The first thing that you want to do is head over to Google and search for pandas and then profiling. Click on the first link which will go to the GitHub page. It'll be github.com slash pandas-profiling slash pandas-profiling. Scroll down and then find the command that will allow you to install software. I'm going to use pip install, and then I'm on the Windows so I'm going to head over to the command prompt and I'm going to activate the environment and then I'm going to install it using pip install pandas-profiling notebook HTML.

Installing Pandas Profiling

So, this should take some time okay, so it's installed. Now, I'm going to open up my tube here in the book ADEs a new notebook. So, for this I will just show you using the example code here. What this essentially does is it will import numpy it will import pandas and then it will import pandas profiling and the function that we're going to use is the profile report. Then, we're going to create a data frame whereby it will use the numpy to generate random number 100 rows and five columns. The five column will comprise of ABCDE, and then we're going to create a variable and the variable will be assigned the profile report function and the input argument will be the data frame. Then, the title of the generated report will be called pandas profiling report.

Generating Random Data

Then, we're going to create a HTML report and the full-width will be true so that means that the HTML output will have occupied the full width of the web page. Let's enter that and it's doing a thematic you. Then, we invoke the report by typing in profile and then here you have it, the pandas profiling report. This is done automatically, so it will allow you to do expert Ori data analysis with minimal effort.

Data Analysis Report

So, just have a look scroll around here. It gives you the datasets statistics that there are five variables hundred rolls no missing data no duplicate data. Then, it has a look at each of the five variables A B C D E and for each of the variable it will give you the descriptive statistics and also the histogram and the mean minimum maximum.

Correlation Plot

Then, you can also look at the correlation plot between each of the five variables. Here we have five variables A through E and A through E so the correlation between A A will give you a perfect correlation because it is a self correlation and then you can look at the correlation between A and B A and C A and D A A and E etc B and A B B and C B and D B and E right and you could look at all possible correlation okay.

Heat Map of Pearson's Correlation Matrix

Okay, then this is the heat map of Pearson's correlation matrix and also the Spearman's candles and Fick right. After the correlation, look at the missing values so you see that all of the variables are containing no missing values and so here you see the ten rolls and you see the last ten rolls.

Efficiency in Data Analysis

So, it's very intuitive and it allows you to get a quick expert Ori data analysis of your data with minimal effort. So, you can see that this required only three lines of code the first one will import the necessary library generate the random data so in your case you might only import this necessary library and then you will create a data frame in which you will read in your CSV data and then after that you're going to generate the report so essentially you will create your data frame by reading in your CSV data and then after that you're going to create your report using this block of code here.

"WEBVTTKind: captionsLanguage: enwelcome back to the data professor YouTube channel if you new here my name is Shannon nontox and Hammad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing so in this video I'm going to give a short tutorial and how you can use pandas profiling in order to do exploit Ori data analysis so without further ado let's get started so the first thing that you want to do is head over to google and search for pandas and then profiling click on the first link which will go to the github so it'll be github.com slash pandas - profiling slash pandas - profiling scroll down and then find the command that will allow you to install software so I'm gonna use pip install and then I'm on the windows so I'm gonna head over to the command prompt and I'm going to activate the environment and then I'm going to install it using pip install panda's profiling notebook HTML and so this should take some time okay so it's installed so I'm gonna open up my tube here in the book ADEs a new notebook and so for this I will just show you using the example code here so what this essentially does is it will import numpy it will import pandas and then it will import pandas profiling and the function that we're going to use is the profile report and then we're going to create a data frame whereby it will use the numpy to generate random number 100 rows and five columns and the five column will comprise of ABCDE and then we're going to create a variable and the variable will be assigned the profile report function and the input argument will be the data frame and then the title of the generated report will be called pandas profiling report and then we're going to create a HTML report and the full-width will be true so that means that the HTML output will have occupied the full width of the web page so let's enter that and it's doing a thematic you and then we invoke the report by typing in profile and then here you have it the pandas profiling report and this is done automatically so it will allow you to do expert ory data analysis with minimal effort so you just have a look scroll around here it gives you the datasets statistics that there are five variables hundred rolls no missing data no duplicate data and then it has a look at each of the five variables a b c d e and for each of the variable it will give you the descriptive statistics and also the histogram and the mean minimum maximum and then you can also look at the correlation plot between each of the five variables so here we have five variables a through E and a through E so the correlation between a a will give you a perfect correlation because it is a self correlation and then you can look at the correlation between a and B a and C a and D a a and E and etc B and a B & B B and C B and D B and E right and you could look at all possible correlation okay and then this is the heat map of the Pearson's correlation matrix and also the Spearman's candles and Fick right and after the correlation look at the missing values so you see that all of the variables are containing no missing values and so here you see the ten rolls and you see the last ten rolls okay so it's very intuitive and it allows you to get a quick expert Ori data analysis of your data with minimal effort so you can see that this required only three lines of code the first one will import the necessary library generate the random data so in your case you might only import this necessary library and then you will create a data frame in which you will read in your CSV data and then after that you're going to generate the report so essentially you will create your data frame by reading in your CSV data and then after that you're going to create your report using this block of code here and then afterward you will look at your export or data analysis report by invoking on the profile command and as always the best way to learn data science is to do data science so please enjoy the journey thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videoswelcome back to the data professor YouTube channel if you new here my name is Shannon nontox and Hammad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing so in this video I'm going to give a short tutorial and how you can use pandas profiling in order to do exploit Ori data analysis so without further ado let's get started so the first thing that you want to do is head over to google and search for pandas and then profiling click on the first link which will go to the github so it'll be github.com slash pandas - profiling slash pandas - profiling scroll down and then find the command that will allow you to install software so I'm gonna use pip install and then I'm on the windows so I'm gonna head over to the command prompt and I'm going to activate the environment and then I'm going to install it using pip install panda's profiling notebook HTML and so this should take some time okay so it's installed so I'm gonna open up my tube here in the book ADEs a new notebook and so for this I will just show you using the example code here so what this essentially does is it will import numpy it will import pandas and then it will import pandas profiling and the function that we're going to use is the profile report and then we're going to create a data frame whereby it will use the numpy to generate random number 100 rows and five columns and the five column will comprise of ABCDE and then we're going to create a variable and the variable will be assigned the profile report function and the input argument will be the data frame and then the title of the generated report will be called pandas profiling report and then we're going to create a HTML report and the full-width will be true so that means that the HTML output will have occupied the full width of the web page so let's enter that and it's doing a thematic you and then we invoke the report by typing in profile and then here you have it the pandas profiling report and this is done automatically so it will allow you to do expert ory data analysis with minimal effort so you just have a look scroll around here it gives you the datasets statistics that there are five variables hundred rolls no missing data no duplicate data and then it has a look at each of the five variables a b c d e and for each of the variable it will give you the descriptive statistics and also the histogram and the mean minimum maximum and then you can also look at the correlation plot between each of the five variables so here we have five variables a through E and a through E so the correlation between a a will give you a perfect correlation because it is a self correlation and then you can look at the correlation between a and B a and C a and D a a and E and etc B and a B & B B and C B and D B and E right and you could look at all possible correlation okay and then this is the heat map of the Pearson's correlation matrix and also the Spearman's candles and Fick right and after the correlation look at the missing values so you see that all of the variables are containing no missing values and so here you see the ten rolls and you see the last ten rolls okay so it's very intuitive and it allows you to get a quick expert Ori data analysis of your data with minimal effort so you can see that this required only three lines of code the first one will import the necessary library generate the random data so in your case you might only import this necessary library and then you will create a data frame in which you will read in your CSV data and then after that you're going to generate the report so essentially you will create your data frame by reading in your CSV data and then after that you're going to create your report using this block of code here and then afterward you will look at your export or data analysis report by invoking on the profile command and as always the best way to learn data science is to do data science so please enjoy the journey thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos\n"