How to automate data processing in Python with Mito

Automating Data Analysis with Mido: A Step-by-Step Guide

As we went through the analysis of the ramen data set, we realized that creating equivalent code for each step was quite time-consuming. However, with Mido, we can automate this process and generate the equivalent code in just a few clicks.

One of the first steps we took was to remove all the null values from the data. This is a crucial step in any analysis, as it ensures that our data is clean and accurate. To do this, we used the "remove missing" function in Mido, which allowed us to easily eliminate any rows with missing values. We also applied filters to the data to remove specific entries, such as those from Singapore.

After removing the null values and applying the filters, we were left with a much smaller dataset of 34 values. At this point, we decided to pivot table to analyze the data further. We created a pivot table with "country" as the row value and then counted the number of entries for each country. This gave us a clear overview of which countries our data was coming from.

As we analyzed the data, we realized that creating equivalent code for each step would be time-consuming. However, Mido allows us to save this analysis and generate the equivalent code for future use. So, we saved the analysis and then applied it to the new ramen data set. By doing so, we were able to quickly run through the same analysis without having to recreate all the steps.

One of the benefits of using Mido is that it allows us to save and replay analyses, which saves time and effort in the long run. We also discovered that Mido can generate code for each step automatically, making it easier to automate our data analysis process.

In addition to saving and replaying analyses, we found that Mido offers a range of other features that make it an excellent tool for data analysis. For example, we were able to add columns to the dataset, change the data type of existing columns, and even create formulas to perform more complex calculations. We also discovered that we could undo and redo steps at any time, allowing us to easily experiment with different approaches.

Furthermore, Mido allows us to clear the analysis back to its original state or import new data sets. This feature is particularly useful when working with large datasets or exploring different scenarios in our analysis. We also found that we could export the results as a CSV file and merge multiple data sets together using various types of joins.

Finally, we discovered that Mido offers some advanced features such as graphing capabilities and the ability to add custom columns. For example, we were able to create a pivot table with a graphical representation of our data, which provided an insightful visual overview of our findings.

By leveraging these features, Mido has become an indispensable tool in our data analysis workflow. With its range of automated processes and ease of use, it allows us to quickly generate equivalent code for each step, saving time and effort in the long run. Whether you're working with large datasets or exploring new approaches to data analysis, Mido is definitely worth considering.

To get started with Mido, you can follow the provided install instructions. If you have any questions or need further assistance, don't hesitate to reach out. With its extensive range of features and automated processes, Mido has the potential to revolutionize your data analysis workflow.

"WEBVTTKind: captionsLanguage: enhey this is jake from ido i'm going to show how you can automate your data cleaning and data analysis using mido so for those are new to mido mido is a spreadsheet interface for python you see we've called the midosheet package into our python environment here we're using jupyterlab and every edit we make in this front end here as we call in our data and we edit the data every header we make is going to generate the equivalent python in the code cell below so it's a really fast way to get your analysis done and generate python while you do it so again every edit we do is going to generate the equivalent python below um so all i need to do to render this is just import mighty sheet and then call my issue.sheet which renders this blank interface just want to show you how you install the mito package this is our documentation website you just these three commands right here you can pause the screen and run these in your terminal and then open jupyter lab and then once you do that you will get the minor sheet here so the first thing i'm going to do is just import my data this is connecting to my local files i'm going to take in this data here where is it this raman data so i'm importing this csv you know we can get any data size i can fit into a data frame into mido so if you're struggling with the data sizes you have in excel or other places you can really easily just upload that right into mido so here we see our data set the first thing i want to do is just get rid of these null values we're looking at data about ramen noodles by the way so whether the ramen the certain order of ramen ranks in the top 10 or not is what we're looking in this column and we see we have all these null values so i'm just going to click on this icon here add a filter to this column which is is not empty and there we go we get rid of all the null values when we do that we generate the equivalent code right here for we just did the next thing i want to do is i want to remove all the ramens that come from singapore so i'm going to apply another filter here add a filter and we're going to do does not contain i'm just going to type in singapore here and we see we got rid of all that and so we're left with 34 values and now i just want to know um how many values how many entries and from home and do i have from each of these countries so i'm just going to do a pivot table we're going to do country as the row value and then i'm going to put country again here's value i'm just going to count those so we see we have one from china six from indonesia six in japan six malaysia et cetera et cetera and when we do that everything we've done is creating the gen the equivalent code down here so here's the code for the pivot table for example here's those filters we've applied and so as you can see we really quickly did this analysis and generally rated code while we did it this was a simple analysis i just want to run you through quickly and now i can show you how you can automate this process going forward so if you're going to do a process over and over again on new data sets you only ever have to do it once in mido so i'm going to do is save this analysis i've done which will save this code down here so let's call this ramen we'll save that now we have the same drama analysis we're saving the steps we've done here and now i'm going to do is i'm going to call another sheet and i'm going to call in new ramen data set uh right where is it rr oops did i miss it where is it here we go wrong data set we see the data is not clean we sold the null values we still have some singapore here so all i'm going to do is just replay my saved analysis i'm just going to apply this analysis replay and here we go here's our data set and here's the pivot table that we ended up with so we all we did not have to do the steps again we just generated that from running the save now so this is sort of like a macro essentially and now that we've done that we'll also generate all the code again below so for every analysis you do you generate the code there are other things we can do in mido as well besides just saving and replaying analyses so if we go back to our pivot table here we can graph this super easily with mito so i'll hit the graph button and then as my x-axis i will set the country and then y-axis the country count and we'll see we get a nice graph displaying uh the values the frequencies for each country here if i go back to the base data set and let me just close this graph there's a few other things i can do as well so i can add columns and delete columns for example so this column here let's say you want to delete that gone really great way to condense your data sets down to the data you want to look at let's say i want to add a column here but maybe before that let's say i want to change the data type so for any column we can change the data type which is something people struggle with in python sometimes and might do it's super simple so i'll just click this is a string let's make this an integer there we go and again we'll generate the code for that step every step generates your equivalent code below and this is code that you can use and carry forward with the rest of your analysis now that it's an integer let's say i want to add a column next to it and let's say i want to put a formula in here so i want to do the value of this column times 10. there we go and we can always do to obviously do much more complex formulas as well so just to run through the things we can do we can do undo the steps we've done so like undo we'll see we deleted that we can also redo so you can cycle through the steps going back and forth throughout the different types of analysis you can clear analysis back to the base state which is the state in which you imported the data you can import data you can export this as a csv you can add columns of the columns pivot tables we can merge data sets together using different types of joins we have lookup left right inner and outer and we can also graph and then save and replace so there's a lot you can do in my note i hope you get a chance to check it out just as a reminder here are the install instructions and please reach out with any questions thankshey this is jake from ido i'm going to show how you can automate your data cleaning and data analysis using mido so for those are new to mido mido is a spreadsheet interface for python you see we've called the midosheet package into our python environment here we're using jupyterlab and every edit we make in this front end here as we call in our data and we edit the data every header we make is going to generate the equivalent python in the code cell below so it's a really fast way to get your analysis done and generate python while you do it so again every edit we do is going to generate the equivalent python below um so all i need to do to render this is just import mighty sheet and then call my issue.sheet which renders this blank interface just want to show you how you install the mito package this is our documentation website you just these three commands right here you can pause the screen and run these in your terminal and then open jupyter lab and then once you do that you will get the minor sheet here so the first thing i'm going to do is just import my data this is connecting to my local files i'm going to take in this data here where is it this raman data so i'm importing this csv you know we can get any data size i can fit into a data frame into mido so if you're struggling with the data sizes you have in excel or other places you can really easily just upload that right into mido so here we see our data set the first thing i want to do is just get rid of these null values we're looking at data about ramen noodles by the way so whether the ramen the certain order of ramen ranks in the top 10 or not is what we're looking in this column and we see we have all these null values so i'm just going to click on this icon here add a filter to this column which is is not empty and there we go we get rid of all the null values when we do that we generate the equivalent code right here for we just did the next thing i want to do is i want to remove all the ramens that come from singapore so i'm going to apply another filter here add a filter and we're going to do does not contain i'm just going to type in singapore here and we see we got rid of all that and so we're left with 34 values and now i just want to know um how many values how many entries and from home and do i have from each of these countries so i'm just going to do a pivot table we're going to do country as the row value and then i'm going to put country again here's value i'm just going to count those so we see we have one from china six from indonesia six in japan six malaysia et cetera et cetera and when we do that everything we've done is creating the gen the equivalent code down here so here's the code for the pivot table for example here's those filters we've applied and so as you can see we really quickly did this analysis and generally rated code while we did it this was a simple analysis i just want to run you through quickly and now i can show you how you can automate this process going forward so if you're going to do a process over and over again on new data sets you only ever have to do it once in mido so i'm going to do is save this analysis i've done which will save this code down here so let's call this ramen we'll save that now we have the same drama analysis we're saving the steps we've done here and now i'm going to do is i'm going to call another sheet and i'm going to call in new ramen data set uh right where is it rr oops did i miss it where is it here we go wrong data set we see the data is not clean we sold the null values we still have some singapore here so all i'm going to do is just replay my saved analysis i'm just going to apply this analysis replay and here we go here's our data set and here's the pivot table that we ended up with so we all we did not have to do the steps again we just generated that from running the save now so this is sort of like a macro essentially and now that we've done that we'll also generate all the code again below so for every analysis you do you generate the code there are other things we can do in mido as well besides just saving and replaying analyses so if we go back to our pivot table here we can graph this super easily with mito so i'll hit the graph button and then as my x-axis i will set the country and then y-axis the country count and we'll see we get a nice graph displaying uh the values the frequencies for each country here if i go back to the base data set and let me just close this graph there's a few other things i can do as well so i can add columns and delete columns for example so this column here let's say you want to delete that gone really great way to condense your data sets down to the data you want to look at let's say i want to add a column here but maybe before that let's say i want to change the data type so for any column we can change the data type which is something people struggle with in python sometimes and might do it's super simple so i'll just click this is a string let's make this an integer there we go and again we'll generate the code for that step every step generates your equivalent code below and this is code that you can use and carry forward with the rest of your analysis now that it's an integer let's say i want to add a column next to it and let's say i want to put a formula in here so i want to do the value of this column times 10. there we go and we can always do to obviously do much more complex formulas as well so just to run through the things we can do we can do undo the steps we've done so like undo we'll see we deleted that we can also redo so you can cycle through the steps going back and forth throughout the different types of analysis you can clear analysis back to the base state which is the state in which you imported the data you can import data you can export this as a csv you can add columns of the columns pivot tables we can merge data sets together using different types of joins we have lookup left right inner and outer and we can also graph and then save and replace so there's a lot you can do in my note i hope you get a chance to check it out just as a reminder here are the install instructions and please reach out with any questions thanks\n"