**Understanding Files and Directories in Google Colab and Google Drive**
As data scientists, we often work with files and directories, which can be a bit tricky to manage, especially when working with different platforms like Google Colab and Google Drive. In this article, we will explore the concepts of copying and moving files, as well as deleting them.
**Distinguishing Between Copying and Moving Files**
Before we dive into the actual commands, it's essential to understand the difference between copying and moving files. When you copy a file, it will be present in two locations: the original location and the destination location. On the other hand, when you move a file, it will only be present in one location - the destination location. The original location will be deleted because of its name implies that it is moved from one location to another.
To illustrate this concept, let's consider an example. Suppose we have a CSV file named "weather-myka" and we want to move it into a new directory called "data set" in Google Drive. We can use the `mv` command to achieve this. After running the command, we will see that the file is present in the destination directory, but the original location ("weather - myka") will be empty.
**Using the MV Command**
The `mv` command not only moves files from one location to another but also allows us to rename existing files. For instance, let's say we want to rename a file named "dhfr.csv" to become "dhfr_2003.csv". We can use the following command:
```bash
mv dhfr.csv dhfr_2003.csv
```
This will create a copy of the original file and rename it to the new name. The original file will be deleted.
**Using the CP Command**
If we want to make a copy of a file with a different name, we can use the `cp` command instead:
```bash
cp dhfr.csv dhfr_to_dot.csv
```
This will create a new copy of the file and rename it to "dhfr_to_dot.csv". The original file will remain intact.
**Deleting Files**
Deleting files is a straightforward process in Google Colab. We can use the `rm` command followed by the name of the file we want to delete:
```bash
rm dhfr.csv
```
This will remove the file from our directory.
However, deleting directories is a bit more complex. We need to create a new directory first and then use the `rm -R` command to delete it:
```bash
mkdir tmp_data
rm -R tmp_data
```
The `-R` option tells the `rm` command to recursively remove the directory and all its contents.
**Applying These Concepts to Google Colab and Google Drive**
Now that we have understood how to copy, move, and delete files in Google Colab and Google Drive, let's apply these concepts to a real-world scenario. Suppose we want to download a dataset from GitHub and move it into our directory in Google Drive.
We can use the `mv` command to move the file to the destination directory:
```bash
mv path/to/dataset.csv /path/to/data/set/
```
To delete directories, we need to create a new one first and then use the `rm -R` command:
```bash
mkdir tmp_data
# ... do some work with tmp_data ...
rm -R tmp_data
```
By following these steps and understanding how to manage files and directories in Google Colab and Google Drive, you can efficiently work with data and build your data science portfolio. Remember to always experiment with new datasets and commands to hone your skills!
"WEBVTTKind: captionsLanguage: enso in the previous video I've shown you how you can get started on using Google collab for your data science projects and so in this video I'm going to show you how you can handle files on your Google collab so the benefit of Google collab is that it can also read files from your Google Drive as well as being able to copy files from your collab onto your Google Drive welcome back to the data professor YouTube channel if you new here my name is tenon non-touching Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing so without further ado let's get started so the code that we're going to be using today is available from the github of the data professor so either you can download this directly by going to the code and going to Python and then click on the collab file handling right click on raw and then save link as you can then you save it into your computer and so the next step is to open up your Google code lab and click on the github and search for data professor and then click on the collab file handling on Google code lab and click on new notebook and then click on the collab file handling on Google collab but for those of you who have already downloaded onto your computer you can click on upload and then you can choose file and then go to the directory where you have already saved a file and then upload that ok so I have already pre downloaded the file and I have also deleted all of the output that is already present on the github so the version that is shown on the github is containing the output from each of the input command but for this tutorial I have already pre deleted all of the output and so we will be able to see it together in real time okay so as previously mentioned today we're covering about file handling on google collab so the benefit of using collab is that you are able to access Google Drive from within the collab and so you will be able to read write copy move and download files so the first step let's start with mounting your Google Drive into the Google collab so with this block of code we're going to import the drive function from the Google collab and then we're going to mount the directory which is content /g Drive and then we're going to have the option for amount equal to two so this option will allow us to remount the directory every time we run the code otherwise it will give us an error so why don't you go ahead and click on the Run cell and so clicking the run cell for the first time will initiate the cloud computing resource on Google and so that might take you some time and after it has successfully run the first cell it will give you this message by telling you to go to this URL so click on it so after you click on your gmail address it will bring this page and then click on allow and then copy the code here and then paste the code and enter and so this is the authorization code and once it has successfully been mounted it gives you the message mounted at content G Drive okay so the next step is to list the content of the directory and we're going to use the LS command so this comes from the bash language so when we want to invoke a bash command we will use the exclamation mark in front followed by the command from bash so you can do a lot of things from the command line and so here we list the contents of the current working directory and so we see that there are two directories called G Drive and sample data and sample data will contain some sample input files for you to play with in your data science projects and so let's add the minus L option in order to get more detail of the files and there three and so by using the - alcaman you are able to obtain other information such as the read write execute permission of the folders and files as well as the file size and the date at which the file or folder was created okay so let's say that you want to create a directory in the Google collab working directory so what you can do is use the mkdir command and so mkdir followed by the name of the directory that you want to create and then use the LS to have a look and then we see that the compiled data folder has been created as expected okay and so now let's create some files and we're able to create files using the bash command line by using the echo command followed by the taps in the quotation mark and then we're going to use the greater than symbol followed by the name of the file which is data.txt so invoke that command so that will create a data txt okay and the second way is to create the file directly from your python code so you can create a data to variable where the content will use the open command and the argument will be the name of the text file that we want to create and it's going to be called data - txt and then we're going to use the W option in order to tell it that we want to write the file and so on the next line we're going to call the data - and then dot write function and so in parentheses we're going to put the text that we want to write into the file and then finally we're going to use the close command okay and the third way to create a file is to download an existing file on the internet so you can head over to the data professor github and then download the weather weeka file and so we can conveniently do this from within the command line by using the W get come in and so this will download the weather weeka dot CSV file and so this tells us that it has successfully been downloaded so let's list the contents of the file so just a moment ago we had created data XT data to txt and we have downloaded the and then we have downloaded the weather - wicked CSV okay so let's head over now to the read files so you can read the files directly in bash command line by using the cat command okay and so we're gonna use the exclamation mark followed by the cat command and then followed by the name of the txt file and so that will display the text that is coming from the data tags and then we're also gonna have a look at the data to test which was created from Python okay so both of them worked so let's have a look at how we can read the files in Python so firstly we're going to define a variable called data and then we're gonna use the open command and then followed by the name of the text file that we want to read in and then for the argument we're going to use the R and so that means that it will read the file so let's run the cell here and then we're gonna create a variable called data underscore content and the value will be data dot read command and so what this does is it will assign the content of the data it will read into this data content and if we run this we get the content of the file and notice that there is a backslash n so we can simply delete that using the strip command and so now we're going to access the Google Drive from the Google collab so let's list the content of the collab notebooks folder which is found in our Google Drive okay and so these are the files and folders that are within this collab notebook folder so we should be aware that the contents of the collab working directory will be deleted every time the session ends and so it is crucial that we save the files that are in our working directory into the Google Drive so that we will be having the file saved for future usage and so in the next couple of cells I will be copying some files from the data set folder which is in the Google Drive so if you want to reproduce this go over to your Google Drive and then create a folder called data set and so the next thing that you want to do is go to the data professor github and click on data and then go to the dhfr csv and then right click on the raw link and then save link ass and download the file into your computer and notice that the file might be saved as dhfr dot txt so you might have to rename it to be the SF r dot CSV and then you want to copy this into the data set folder ok and now the data set folder has the file so now let's list the contents of the data set folder ok and so now we see that there is the dhfr dot CSV the file that we have just copied into here so let's try to have a look whether we can change the directory into the data set folder and then we're gonna print the working directory so both of these are in bascomb and ok and notice that it retrieves the output to be content so it means that we were not able to go into the data set folder so apparently Google Drive did not allow us to change to this directory the data set and so the purpose of changing to the data set folder on the Google Drive is to go into that folder and copy the dhfr dot CSV into our collab working directory but apparently that didn't work so what we're going to do is we're going to do that in the following next couple of cells but before we do that let's list the contents of the current working directory again and so notice that currently we don't yet have two dhfr dot csv file and so now we're going to copy the DSF our csv file from the data set folder of our Google five and then we're going to put it into the current working directory so the dots here represents the current working directory so we're gonna run that line of code and then we're going to LS again and now let's see if the dhfr dot csv file is here and it is here okay so we have successfully copied the file from our Google Drive into the working directory of collab okay and once the working directly of collab let's do that by typing in PWD okay and it says content and you can even do the same by using Python so you might need to have the OS package and then OS get CWD and it will also say a Content okay so that's the current working directory current working directory okay so now we're going to copy files from the Google collab onto your Google Drive so this will come in handy when I previously explained to you that whenever the session ends the files that are in your Google collab will be automatically removed so you want to have a way that you could back up those files particularly the files that might take a long time to run during your session in Google collab so what you want to do is you want to backup the files into your Google Drive okay so let's list the contents of the data set folder on Google Drive again okay so the only file in here is the dhfr dot CSV and so now we're gonna copy the data txt and data to txt into this folder using the CP command and then finally we want to look at the content of the data set folder and so here we go before we saw that there was only one file in here and now we have already copied the data txt and data to txt files so maybe you're wondering what if I want to move files from into google collab and Google Drive how can you do that well you could do that using the MV command ok so before we move the files let's distinguish between copying files and moving files so in copying files the file will be present in two locations the original location and the destination location but when we move the files the file will be present in one location which is the destination location and so the original location the file will be lost ok it will be deleted because as the name implies it is moved from one location to the other location ok so let's do that let's have a look at the contents of the current working directory ok and currently we have the weather - myka the csv and we're going to move it into the data set folder in the Google Drive ok and let's have a look in the destination directory now the weather weeka csv is present in the destination directory and what about the original directory as we will see ok so whether that's we Kedah csv is deleted from the original directory all right so the file whether weeka CSV is moved to the destination directory from the source directory and therefore the file no longer exists in the source directory and so we can have a look by using the LS command and so the weather week our file is lost here okay and another function of the MV command is not only to move files from one location to another location but you can also use it to rename existing files for example you could use this MV and then the name of the original file and then followed by the name that you want to rename it as ok so let's say that you want to rename dhfr CSV to become dhfr to dot CSV and so when you do that you get DSF our 2003 and the original file is lost okay so it's kind of like moving the file to a new name ok and the same concept applies to copying files you could copy a file from one name and then you could create a copy that contains a different name ok so instead of MV you could replace it with CP right so CP da c FR dot csv da c FR - dot csv so we will copy the dhfr file and rename it as a DF r 2 dot csv that's a copy ok so now let's have a look at deleting files so deleting files is rather simple you just use the RM command followed by the name of the file that you want to delete ok so the file whether weeka is not present because we had just moved it so let's rename this to be the SF r - dot csv chef enter ok and then let's check the content again and so we see that the dhfr - file is deleted it's not in here so before we can delete the directory let's create a directory that we can play with so mkdir and then the name of the folder ok and then we're gonna okay we don't need that okay so we're gonna download a data set from the github of data professor and after download we will move it into the directory which we had created and now let's look at the content of the directory and now the directory contains a file so let's try to delete the directory RM like we have previously used for removing the file RM and then the file name let's see if that works on the directory and so it says that it cannot remove TMP data because it is a directory okay so in order to remove the directory we're going to use the recursive function - R and so we will have to type in the command RM space minus R space and in the name of the directory okay so that works and then let's list the content again okay there you have it the TMP data directory is now lost so we still have the compiled data which is an empty directory and so we could use the same command so the directory can be empty or contain a file we can still use the same command RM minus R in order to delete the directory you just have a look again list the content and there you have it the compiled data directory is now deleted so I hope that this video was helpful for you to learn about how you can handle files and directories inside Google collab and how you can make it work with your Google Drive and so as always if you want to learn data science the best way is to do data science and build your data science portfolio and you could do this by using new data you could modify the notebook that we have here and you said on your favorite data set or a new data set that is of your interest and play around okay so if there is a particular topic that you would like me to cover please list it down below in the comment section thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videosso in the previous video I've shown you how you can get started on using Google collab for your data science projects and so in this video I'm going to show you how you can handle files on your Google collab so the benefit of Google collab is that it can also read files from your Google Drive as well as being able to copy files from your collab onto your Google Drive welcome back to the data professor YouTube channel if you new here my name is tenon non-touching Ahmad and I'm an associate professor of bioinformatics on this YouTube channel we cover about data science concepts and practical tutorials so if you're into this kind of content please consider subscribing so without further ado let's get started so the code that we're going to be using today is available from the github of the data professor so either you can download this directly by going to the code and going to Python and then click on the collab file handling right click on raw and then save link as you can then you save it into your computer and so the next step is to open up your Google code lab and click on the github and search for data professor and then click on the collab file handling on Google code lab and click on new notebook and then click on the collab file handling on Google collab but for those of you who have already downloaded onto your computer you can click on upload and then you can choose file and then go to the directory where you have already saved a file and then upload that ok so I have already pre downloaded the file and I have also deleted all of the output that is already present on the github so the version that is shown on the github is containing the output from each of the input command but for this tutorial I have already pre deleted all of the output and so we will be able to see it together in real time okay so as previously mentioned today we're covering about file handling on google collab so the benefit of using collab is that you are able to access Google Drive from within the collab and so you will be able to read write copy move and download files so the first step let's start with mounting your Google Drive into the Google collab so with this block of code we're going to import the drive function from the Google collab and then we're going to mount the directory which is content /g Drive and then we're going to have the option for amount equal to two so this option will allow us to remount the directory every time we run the code otherwise it will give us an error so why don't you go ahead and click on the Run cell and so clicking the run cell for the first time will initiate the cloud computing resource on Google and so that might take you some time and after it has successfully run the first cell it will give you this message by telling you to go to this URL so click on it so after you click on your gmail address it will bring this page and then click on allow and then copy the code here and then paste the code and enter and so this is the authorization code and once it has successfully been mounted it gives you the message mounted at content G Drive okay so the next step is to list the content of the directory and we're going to use the LS command so this comes from the bash language so when we want to invoke a bash command we will use the exclamation mark in front followed by the command from bash so you can do a lot of things from the command line and so here we list the contents of the current working directory and so we see that there are two directories called G Drive and sample data and sample data will contain some sample input files for you to play with in your data science projects and so let's add the minus L option in order to get more detail of the files and there three and so by using the - alcaman you are able to obtain other information such as the read write execute permission of the folders and files as well as the file size and the date at which the file or folder was created okay so let's say that you want to create a directory in the Google collab working directory so what you can do is use the mkdir command and so mkdir followed by the name of the directory that you want to create and then use the LS to have a look and then we see that the compiled data folder has been created as expected okay and so now let's create some files and we're able to create files using the bash command line by using the echo command followed by the taps in the quotation mark and then we're going to use the greater than symbol followed by the name of the file which is data.txt so invoke that command so that will create a data txt okay and the second way is to create the file directly from your python code so you can create a data to variable where the content will use the open command and the argument will be the name of the text file that we want to create and it's going to be called data - txt and then we're going to use the W option in order to tell it that we want to write the file and so on the next line we're going to call the data - and then dot write function and so in parentheses we're going to put the text that we want to write into the file and then finally we're going to use the close command okay and the third way to create a file is to download an existing file on the internet so you can head over to the data professor github and then download the weather weeka file and so we can conveniently do this from within the command line by using the W get come in and so this will download the weather weeka dot CSV file and so this tells us that it has successfully been downloaded so let's list the contents of the file so just a moment ago we had created data XT data to txt and we have downloaded the and then we have downloaded the weather - wicked CSV okay so let's head over now to the read files so you can read the files directly in bash command line by using the cat command okay and so we're gonna use the exclamation mark followed by the cat command and then followed by the name of the txt file and so that will display the text that is coming from the data tags and then we're also gonna have a look at the data to test which was created from Python okay so both of them worked so let's have a look at how we can read the files in Python so firstly we're going to define a variable called data and then we're gonna use the open command and then followed by the name of the text file that we want to read in and then for the argument we're going to use the R and so that means that it will read the file so let's run the cell here and then we're gonna create a variable called data underscore content and the value will be data dot read command and so what this does is it will assign the content of the data it will read into this data content and if we run this we get the content of the file and notice that there is a backslash n so we can simply delete that using the strip command and so now we're going to access the Google Drive from the Google collab so let's list the content of the collab notebooks folder which is found in our Google Drive okay and so these are the files and folders that are within this collab notebook folder so we should be aware that the contents of the collab working directory will be deleted every time the session ends and so it is crucial that we save the files that are in our working directory into the Google Drive so that we will be having the file saved for future usage and so in the next couple of cells I will be copying some files from the data set folder which is in the Google Drive so if you want to reproduce this go over to your Google Drive and then create a folder called data set and so the next thing that you want to do is go to the data professor github and click on data and then go to the dhfr csv and then right click on the raw link and then save link ass and download the file into your computer and notice that the file might be saved as dhfr dot txt so you might have to rename it to be the SF r dot CSV and then you want to copy this into the data set folder ok and now the data set folder has the file so now let's list the contents of the data set folder ok and so now we see that there is the dhfr dot CSV the file that we have just copied into here so let's try to have a look whether we can change the directory into the data set folder and then we're gonna print the working directory so both of these are in bascomb and ok and notice that it retrieves the output to be content so it means that we were not able to go into the data set folder so apparently Google Drive did not allow us to change to this directory the data set and so the purpose of changing to the data set folder on the Google Drive is to go into that folder and copy the dhfr dot CSV into our collab working directory but apparently that didn't work so what we're going to do is we're going to do that in the following next couple of cells but before we do that let's list the contents of the current working directory again and so notice that currently we don't yet have two dhfr dot csv file and so now we're going to copy the DSF our csv file from the data set folder of our Google five and then we're going to put it into the current working directory so the dots here represents the current working directory so we're gonna run that line of code and then we're going to LS again and now let's see if the dhfr dot csv file is here and it is here okay so we have successfully copied the file from our Google Drive into the working directory of collab okay and once the working directly of collab let's do that by typing in PWD okay and it says content and you can even do the same by using Python so you might need to have the OS package and then OS get CWD and it will also say a Content okay so that's the current working directory current working directory okay so now we're going to copy files from the Google collab onto your Google Drive so this will come in handy when I previously explained to you that whenever the session ends the files that are in your Google collab will be automatically removed so you want to have a way that you could back up those files particularly the files that might take a long time to run during your session in Google collab so what you want to do is you want to backup the files into your Google Drive okay so let's list the contents of the data set folder on Google Drive again okay so the only file in here is the dhfr dot CSV and so now we're gonna copy the data txt and data to txt into this folder using the CP command and then finally we want to look at the content of the data set folder and so here we go before we saw that there was only one file in here and now we have already copied the data txt and data to txt files so maybe you're wondering what if I want to move files from into google collab and Google Drive how can you do that well you could do that using the MV command ok so before we move the files let's distinguish between copying files and moving files so in copying files the file will be present in two locations the original location and the destination location but when we move the files the file will be present in one location which is the destination location and so the original location the file will be lost ok it will be deleted because as the name implies it is moved from one location to the other location ok so let's do that let's have a look at the contents of the current working directory ok and currently we have the weather - myka the csv and we're going to move it into the data set folder in the Google Drive ok and let's have a look in the destination directory now the weather weeka csv is present in the destination directory and what about the original directory as we will see ok so whether that's we Kedah csv is deleted from the original directory all right so the file whether weeka CSV is moved to the destination directory from the source directory and therefore the file no longer exists in the source directory and so we can have a look by using the LS command and so the weather week our file is lost here okay and another function of the MV command is not only to move files from one location to another location but you can also use it to rename existing files for example you could use this MV and then the name of the original file and then followed by the name that you want to rename it as ok so let's say that you want to rename dhfr CSV to become dhfr to dot CSV and so when you do that you get DSF our 2003 and the original file is lost okay so it's kind of like moving the file to a new name ok and the same concept applies to copying files you could copy a file from one name and then you could create a copy that contains a different name ok so instead of MV you could replace it with CP right so CP da c FR dot csv da c FR - dot csv so we will copy the dhfr file and rename it as a DF r 2 dot csv that's a copy ok so now let's have a look at deleting files so deleting files is rather simple you just use the RM command followed by the name of the file that you want to delete ok so the file whether weeka is not present because we had just moved it so let's rename this to be the SF r - dot csv chef enter ok and then let's check the content again and so we see that the dhfr - file is deleted it's not in here so before we can delete the directory let's create a directory that we can play with so mkdir and then the name of the folder ok and then we're gonna okay we don't need that okay so we're gonna download a data set from the github of data professor and after download we will move it into the directory which we had created and now let's look at the content of the directory and now the directory contains a file so let's try to delete the directory RM like we have previously used for removing the file RM and then the file name let's see if that works on the directory and so it says that it cannot remove TMP data because it is a directory okay so in order to remove the directory we're going to use the recursive function - R and so we will have to type in the command RM space minus R space and in the name of the directory okay so that works and then let's list the content again okay there you have it the TMP data directory is now lost so we still have the compiled data which is an empty directory and so we could use the same command so the directory can be empty or contain a file we can still use the same command RM minus R in order to delete the directory you just have a look again list the content and there you have it the compiled data directory is now deleted so I hope that this video was helpful for you to learn about how you can handle files and directories inside Google collab and how you can make it work with your Google Drive and so as always if you want to learn data science the best way is to do data science and build your data science portfolio and you could do this by using new data you could modify the notebook that we have here and you said on your favorite data set or a new data set that is of your interest and play around okay so if there is a particular topic that you would like me to cover please list it down below in the comment section thank you for watching please like subscribe and share and I'll see you in the next one but in the meantime please check out these videos\n"