Live Code Along - How to quickstart your data analysis

Getting an Error with Unzipping

It seems like I'm trying to unzip something, but it's because I made a mistake. It's DVD Rentals Master.zip and not DVD Rental.zip. Then Chain asked how do I know the specific value of the IP address of this database. Do I need to know it in my specific coding? Yes, indeed, this is something that you have to know yourself.

So, for example, if I work at a company that has PostgreSQL databases that I can analyze, then I should go to the system admin and ask him or her what is the IP address. What is the URL basically where is this database hosted? So, for this example, um DVD Rentals was contained inside like DVD Rentals was at the iB that is shown here but for other databases they will be in different places right.

So, this is the IP address that I read off from my Google Cloud Platform console. Uh, so Google Cloud told me this is where your database is running this is how you can connect to it. So, this is the IP address you should use.

UdP is asking a question. The integration feature is an excellent function. Is it secure enough to store credentials from another server and is there any encryption involved? So, that's a great question actually. Um, so yes, um it's very secure uh so we use um an external service called um Hashicorp Vault which is known to be like the best in terms of security and encryption of secrets.

So, rather than us storing these secrets that you put in here ourselves um we basically depend on a very professional company whose main business is storing secrets as safely and securely as possible. So, every secret you store here is being encrypted in transit on the disk and only decrypted when you need it. So, it's entirely secure.

There's a question from PK. How much data can the notebook handle? I'm not sure exactly what you mean. Um, there's different ways to answer this question. I would say that a workspace you can put three gigabytes of data inside a workspace um you can go up to five gigabytes uh but if you go over five gigabytes of data that you store inside one data camp workspace then um you lose access to your workspace of course.

We can still get you the workspace data afterwards but we for now set some sort of a fair limit use on three gigabytes of data. Um, then in terms of how much data can the node handle can also be in terms of processing power in terms of working memory so in terms of the ram workspace gets so you get four gigabytes of ram inside a data camp workspace um depending on how complicated the algorithms are that you do this becomes a problem sooner or later.

But for like the basic analyses we did for example accessing PostgreSQL data this is not uh like this shouldn't be a problem. The ram shouldn't be the problem if you're going to do crazy ensemble methods or you're going to do hyper hyper parameter tuning for different random forests that's when that's moments when you could potentially hit the limits.

Definitely let us know through the feedback mode over here let us know like hey I'm hitting the limit for my analyses and that's good information for us to at some point also increase the resources we give every person on data camp workspace. So, I also hope that that answers PK's question. Are there any other questions? Alright, cool um Senna lets me know that there's no additional questions so I think we can um leave it at this right on time only 90 seconds left before it's uh 6:30 here in Belgium.

I want to thank you all for attending um slight hiccup at the end with the DVD Rentals data probably in an hour you'll be able to start the workspace from the GitHub repository again um but uh thanks for staying with us um thanks for tuning in and I hope to see you soon in another quarter-long session where I'll hopefully be able to showcase more nice data camp workspace features that will help you basically do data science in the blink of an eye without having to configure or install anything on your own system.

Thanks goodbye. Just some closing messages here um we'll be organizing more of these presentations in the future um so stay tuned and please let us know if you have any additional feedback regarding um these sessions we will post the survey link in the chat but also on Slack along with the recording so your feedback will really make a difference in helping improve the future events that we hold and improve workspace as a whole.

Finally, if you haven't already please feel free to join the data camp global Slack community where we will keep you updated about new workspace features and events that are happening. So, this rounds up the fourth Workspace Live Code Along session I remind you that the session was recorded and the link to the recording will be posted later on Slack email and social media so with that being said uh have a great rest of your day and thank you for joining

"WEBVTTKind: captionsLanguage: enhello to everyone um first of all thank you for attending the fourth workspace live code along a huge thank you as well to the speaker phillip shominars who together with datacamp and especially all of you made this event possible once again i'll remind you the session is recorded and the link to the video will be posted through slack email and social media platforms moving to the next slide a little bit about data camp itself so our mission statement is to democratize data science education and make data literacy accessible to millions of people and businesses around the world through our learning platforms and now also through our data camp workspace certification and events such as this live code along moving to the next slide talking about our speaker today you may have seen philip before in some interactive courses here at datacamp but now he is the product manager of datacamp workspace today he will be more than happy to explain how you can start your own data analysis with just a few clicks using workspace today's live code along is divided into several parts first philip will show you how you can start your workspace from a template and add it to your portfolio next philip will start a workspace from a github repository and explain how you can add this to your portfolio as well and finally he will connect the workspace to a relational database and build a data report every part will take about 30 minutes and will have its own q a section if you have any questions feel free to ask them via the questions button on the control panel of the go to webinar philip will then try his best to answer them during the corresponding q a session finally we want to remind everyone that this session is being recorded and the link will be shared with you later on slack email and social media platforms thank you for your attention and now the live code along itself will begin philip over to you hello can you hear me right just checking yes loud and clear okay perfect all right hi all um it's really cool to see that uh you've attempted to such numerous i like so numerously to this live code log the fourth one on data count workspace it's going to be a little bit different from the code logs we did before so as i'm doing different data analyses that you'll be able to follow along and code along yourself i will also be showing some of the latest features we've been had we've been adding to data camp workspace so in the first case so in case you've never tried your hand the data came workspace yet basically what data came workspace is it's as easy as taking a couple of courses on data camp but we made learning very easy make it possible in a couple of seconds couple of clicks you can get started learning data science but workspace it's the same but for your own data analysis projects so you can get started with your own data science project very easy with a couple of clicks no installation no configuration required so if you haven't uh explored like if you haven't worked with your workspace yet this is what you'll see if you click on the workspace tab once when you're logged in it's important to note that for people who are in a b2b group or b2b account they don't necessarily have access to data camp workspace for these people i suggest that they quickly create a personal account on their personal email so that they can still get access to data camp workspace so once you click on get started for free basically land in your workspace dashboards so this is my workspace dashboard of course there's tons and tons of workspaces on here because i'm working day in day out on making data cam workspace the place where you can come to do your data science projects but in your case probably there'll be less workspaces here so basically every dial here is a data cam workspace so in the first part of this code log what i want to do is basically walk you through the steps necessary to basically do your own data science project with an interesting data set that we have prepared for you so suppose you already took a couple of courses on data camp you took a couple of data gun projects and you're basically ready to take the next step and basically say like i want to do a free form project where nobody is telling me what to do i just want an interesting data set to analyze i want to analyze it and i basically want to share my work with the worlds for that we have built uh data camp workspace templates that you can see here on the left hand side so you see data sets recipes and playbook so let me basically look at all templates first and we see a bunch of templates so all kinds of different templates there's data set templates to really get you started with a data science project and starting with a data set there's recipes that give you tips on how you can solve typical common problems that you basically have to common things you'd have to do in the data science project and then there's also playbooks which are longer form templates that really solve a specific business problem for our specific use case we will let me quickly check if there are already any questions okay for our use case we will basically start a workspace from a dataset template so there's a bunch of stuff i can choose from here like i can analyze mediazoles data i can analyze loan data but in my case let's analyze some nba shooting data so i click on this and i can basically preview this template to see if this is a template i'm interested in exploring so it's apparently a data set based on statistics to be found on the mba websites about basketball um i see that the data frame contains a couple of columns like shooter x y range defender score let's say i find this interesting and let's use this template also i'm i'll be using the python programming language throughout this template you're free to use r but then i can't guarantee that you'll be able to follow along as easily because you'll constantly have to translate it from python to r so i uh strongly recommend that you also use python here so i just clicked on use templates and what this will do this will launch a data camp workspace for me uh without me having to configure anything so this launch jupiter lab ide which is a very commonly used integrated development environment for data science and all the files necessary for me to get started with my analyses are basically available here so i see the same things that i saw in the template preview before and dc is the data set based on statistics can be found on the nba website uh there's already some interesting like some useful code for me that imports the data from the csv that's also contained inside this uh data camp workspace and there's even a printout so let me already run this so this is the first so i use shift enter to execute the code in these cells you can also always use the play button here at the top all right so that's that um then i'm going to import the csv and i see that the first five rows are indeed there's like the shooter so that's the name of the basketball player that did some shots x and y that's then the position from where this um the shooter was uh like basically was taking the shot at the hoop so it's the horizontal distance of the shot taken from the baskets in feet and the y is the vertical distance okay interesting all right what do i see else like who was the defender so who was trying to basically field off the player's shot and then there's a score whether or not the shot has been made or the shot has been missed so that i also see in the uh some sample code that was already included see indeed that the score is either made or missed all right cool i think i have a pretty good sense of this data set so like i'm going to cut a couple of cells to clean this up here like this i can also use dd for that and then i get also suggestion for some interesting questions for me to answer so um indeed i get to explore this exciting data set i can't wait i'm super excited um can't think of where to start try a couple of questions so i see a pretty cool plot here like plot shots taken and missed via scatter plots and you can look at this example for shots taken by russell westbrook to give some inspiration so i basically get a suggestion like maybe you should try to recreate this plot over here so let me do that right i'll just get rid of this and let's say it let's plot shots taken and this scatter plot like this all right so now i can start coding so the first thing i should do i guess is basically get the image of the basketball court in python and if i look here on the side i already see that there's an nba port port.jpg in there so what can i do i see that already mapleflip bypass plt is loaded in so i can basically say i can use the imread function of matplotlib just to remind you um this is actually a code long so i highly recommend you to like code long as i'm coding through this right um in a somewhat later we will also share a publication a workspace publication with all of the code that we've written so far so that like if you missed a step here or there you can still like fill it in and basically still create a publication feature on your date again profile page so i'll do like this like i'm reading out this image over here i'll also check real quick what the width is of this this image and i'll also do that for the height and let me quickly print out width and height okay so i see that this is an image of 1365 pixels by 100 like 1455 pixels this is something that will be interesting later on i can also see what the image would look like if i print it out so blt.show and i see indeed that i um so it's basically the pixels i've been converted to the to the axis units here okay cool um so next thing i would like to do is is basically um draw i'll draw all of these all of these points basically all of these different observations for a certain shooter on this image so let me start by creating a figure a maple clip figure and let's say we make it like size eight like we want to make it uh eight in width and then we also want to make it um eight in height so you do like this with times height that way we're sure we're respecting the aspect ratio that was defined by the image in the first place if i do plt to show like this i'll first add some axes that's basically drawing axes under the image this all right so this is just our blank canvas the next thing we can do is basically on these axes now that we will later on drop the scatter plots on we can now show the image like this all right great so that's that we can get rid of this now we don't no longer need this all right this is perfect and um what you also see here is whenever you create a plot i just use x.m show it will always print out like the the plotting object behind the scenes if you want to avoid that you can either use a semicolon here then the text disappears or you can also use plt.show and then it's also not happening right um what we'll do in addition so basically what we have to do now is we want to get the x coordinates and the y coordinates and basically draw them on here but we immediately see that the x coordinates are minus 3.8 5.5 while here we're dealing with a 0 to range and 1 we're dealing with the zero to 1 300 range i'm getting a request to speak a little slightly a bit slower so i'll do that and also get a request to zoom in a bit so i'll also do that i hope this is better for people to follow along but so what i was talking about i said um so basically the x-axis and the y-axis is um like is there a completely different scale than the ones in here so i'll probably have to do some sort of transformations to make sure that uh the the the dots of where the the player was when he when he threw the ball that that still works out so let's see um let's just very stupidly maybe uh just try to plot the x and y coordinates without making any changes so we'll do scatter and then we select from the data frame that was loaded in the x-coordinates and we select from the data frame that was loaded loaded in the y-coordinates and then um like let's give the the images like let's give the the points a size of three and then we use the color uh no but i'll wait with that um wait that i'll do like this and what do we see yeah so as expected what do we see like it's all in here at the top like the dots are all here so there's clearly some scaling that needs to be done so what we can do to scale is basically um if you know that an nba court like the width of an nba court is 50 feet and our image is like 1 450 pixels wide roughly what we can do is basically do create an x rescale or create a actually skilled like this and say that we will do divided by width times uh we will divide it by width no we will divide it by 50 because that's the width of the field and then we will multiply it with width over here and we'll do the same thing with y pre-scaled df y and we divide it in this case uh what we're seeing is basically half the length of a basketball um so that's 47 feet so we can divide it by 47. that's just information i know right like that the width of a basketball field is 50 and the height like the length of a basketball court is like half of the length is 47. so let's now uh print rescaled like this and rescaled like that and this is already somewhat better but there's still some work to do right we clearly can distinguish the circles of our blood over here but it seems like we have to flip it around like the y-axis we have to flip it around and the excess axis that's still skewed and that makes sense because the x-axis here you see it's minus 3 minus 0.1 minus 0.5 so it's counted from like considering this to be the center rather than the bottom left like the top left screen being the center so what we can do here is we basically add uh 0.5 to this like that way we normalize it to this is already much better like the x-axis seems to be correct now but then the y-axis what we need to do we have to flip it around so what we can do here is one minus divided by 74. this is already somewhat better but not yet perfect that's because i made a mistake here i should do like this right it's one minus and then dfy divided by 47 times the height all right so this is already much better but we do see that there's slightly like it's slightly skewed to the left-hand side so that's probably something that has to do with our image so it seems like we can add a couple of maybe like 0.55 or something and now we see that the shots are cleanly nah maybe three five something like this yeah this is much better so we see here clearly that now there's a lot of shots taken outside of like the the three pointer range this is actually a pretty good depiction but of course like if we compare this to the image over here we still get into filtering based on who is the player and we still haven't given the given the points different color depending on whether the shot was made or the shot was missed right so maybe before i do that i also want to clean up a bit so um somebody are already asked like can you talk a bit about the ad axis so what you do in matlab clip you basically start from a canvas i create a figure now you add the axes and what you're seeing here is that the origin should be zero and here the origin should be zero as well and the axis should have ticks um of step one but that's something we can talk about a little more later on something you can also check on the documentation so uh what i'll do first is i'll set the ticks to be like this so that we no longer see these annoying numbers on the side because they don't add a lot of value and i'll do the same for the white text as well so now like the numbers are gone here and by the way every time that you see me executing a cell i'm basically using shift and enter as a short key it's just the same as clicking on the play button here then um something else i can do because we see like a double a double rectangle here so what i'll do is i'll just set off like i use plt axis and i'll just use off like this all right so this is now clean no double lines or something this is something i can get started like this is something i can continue on um the first thing we'll do is make sure there's some nice colors in there right so we can use basically what we want to do i'll open up in a new in a new code cell i basically want to give every point a color depending on whether the shot was hit or the shot was missed so that means that we have to go from the score column to a new column let's call the column color so that i can immediately use this inside my macbook function so what i'll do here is i'll use df player score and then i'll use the map function and i'll say mate should be converted to green so that made shots will appear in green later on and then missed i should convert that into red all right so df player is not defined it's df of course so what we see indeed that made shots have been converted to green missed shots have been converted to red so this i can now basically add here and i can create a new column called color like this that i add to my data frame and what i can do now here is a color equals df color and then i can say two lists and if i do like this now i see indeed very nicely all uh the all the missed shots in red all the uh like all the made shots in green maybe the size can be somewhat larger here the s stands for size so let me make that then maybe like this it's already better again but again this is the entire data set right if we look at the dimensions of this data set shape we see at 700 776 shots from four different players right um so what i'll do now as the last step is basically do a filtering to make sure that i only select the shots by for example what was the suggestion of russell westbrook so i'll already copy that name so i don't have to do that in a second so i first have to do the filtering now so if you remember from pandas the filtering happens as you can basically generate like a series of uh booleans by using the shooter in this case and saying has to be equal equal to russell lesbroke i'll also copy that to another uh to another cell so we can experiment with that so we need to get a series that is false or true depending on whether it shoots russell westbrook now we can basically subset that same data frame like this by um surrounding it by brackets and putting the df in front can basically subset that and now we only get shots by russell westbrook and indeed we see russell westbrook is the only um the only shooter in here so that's we got and we can copy over here and i'll call this and i'll give this as the name of a new data set a new data frame df player and then i'll do like this like this like this so maybe also to make sure that my code is still fully reproducible let me start at the top i import numpy i just use shift enter to click through this entire thing and what do we see now now we see indeed only shots by russell westbrook and if we compare it to the picture that was created here indeed it's exactly the same so we see the red dots and the green dots maybe we can make it slightly less big by changing it to eight here and then we see indeed a good looking plot over here so let's say now that i'm happy with this plot that i put together i'm also going to clean this up a bit um i'm just going to say let's plot shots taken by russell westbrook and color them depending on whether or not the shot was made this all right cool this is code i'll like this i'll get rid of so suppose i'm now um super happy with the the way this code looks the next step in data camp workspace is actually it's super easy now to say i'm happy with the way this notebooks look this notebook looks i want to share this with the world right so let's do that again basically click on share and then here you can click on publish workspace you can just publish the notebook so you have to select which ipython notebook you want to create a publication from in this case there's only one ipython notebook inside the workspace so there's just no notebook.choose from i can just click on publish here it's going to load for a second it's basically re-running the workspace from top to bottom again to make sure that every code cell that is dependent on the other one like it still all works and you didn't make any mistakes in the meantime and now i have a publication i can super easily share with friends with colleagues all right so this is a publication it's clearly that it's mine so it has my name on it you can even give it an upvote you can also share it with friends and colleagues to give you an upvote on it but you also see here in the workspace publication oh what in the sharing status is that it has public access so that means that everybody who has the link to this publication will be able to access this and that also it's automatically featured the publication is automatically featured on my profile page what does that mean is basically if i go here to my data cam profile i open up my profile in a new tab what do i see i see my nba shooting data appear here so as such like super easily you're able to start from data science projects like from like the nba data set do some work on it then generate a publication from it and then feature that publication on your profile page where it's shown next to all the courses you completed all the projects you completed all the tracks you potentially completed in my case i didn't complete any tracks but i'm sure like some of you have already completed a couple of data cam tracks so rather than having to wrangle github to make sure that your notebooks are showing on there and rendered nicely basically we have tried to make that step very easy with data gap workspace being able to create a publication um very easily basically all right so i'm going to pause here for a second and have a look at the questions um so there's a question by heath she asked me to pause for a minute um so yeah i think uh senna who is conducting life code long he will share a publication that i made in the past that basically shows all the uh code that i've written so far or like roughly because i like i kind of winged it a bit um but he will share the uh the solution code with you so that you can still uh you can still review this if you couldn't follow along entirely then i should also ask the question to go through the second cell again if she means that the second cell is this pandas read undercourse csv call what this is basically doing is from the pandas package it's using pandas as being imported as pd so the read underscore csv function is called and then we point to the csv that we want to load in and the csv is here in this case so that's basically taking the data from the csv file into like loading it into python into a data frame and then with the head function we look at the first five records i hope that that answers the question then there's also a question by asha who asks whether this workspace can now be posted on github yes this is possible so um what you have here inside jupyter notebook is a terminal so inside the jupiter lab interface you can just go through the launcher and open up the terminal and in here you can basically uh like i can create uh like on github i click on to build this it shouldn't show um data camp um so on my personal repository like my personal uh profile it's perfectly possible to create a new repository here and then just add um like add the remote here um authenticate yourself and push these changes to github so this is all possible from inside data camp workspace and it's also something i can show you later on all right so maybe we have a couple of minutes left let me walk through exactly when i think that's what aisha is asking for um or like what is exactly happening here so the first thing we do is uh we create a figure uh with a certain size so it's uh with like the width in this case is four and the height in this case is four and basically we use the width times height aspect ratio like the width by height aspect ratio to make sure that it fits basically the image that we read in then we add a couple of axes with origin zero and step size 1 and 1 in both directions then the first thing we basically plot on that canvas is the image itself so the image that we read in from this jpeg then we set xstix set y text to empty arrays so that we get rid of the thick labels on the left hand side and the bottom side then we also just disable all of the axis so that we don't show the axes and then we use basically we rescale x we divide it by 50 because the width of a field is 50 feet and we add roughly 0.5 to it so that the coordinates are shifted properly and then we multiply by the width of the image again and the width of the image in this case was around 1 200 pixels and we do the same thing for y only there the transformation is slightly different in that case we basically have to flip it around around like we do one minus and then divide it by 47 and then we do that times the height so the number of pixels in the height then we create a new column color that we mapped from the column score and we basically map made to green and we map mist to red and we store that into a new column in the data frame color and then finally from um we do a filtering on the data frame where we just filter on the russell westbrook shooter and then we store that like that we create a new data frame from that and then we can use these axes that we created here to basically add a scatter plot and then we use x rescaled why we scale because otherwise it will look very weird like we have to use the rescaled values here we choose to have a scatter plot with dots that have size 10 and then we also color it in using df player color and then convert it to a list and then if we if we plot it we see this nice image all right so last question and i think then we should continue because there's still lots of things to cover um was uh daniel with a question could you elaborate on the subset function i didn't understand how you made a filter for the specific player so yeah sure so this is df so this is the data frame for all players right the first thing i do is basically understand which records or which observations in this data frame are for the shooter russell westbrook so what i do is i basically check i do like this so i look at the shooter that's so that's basically a series with only shooter column in there and then i see which ones of these are equal to russell westbrook i'll do like this and then i see false false false i can also do this with another one right i can also do this with chris paul for example with uh satgari for example and in that case you'll see a different result and you'll see that the first couple of observations for those it will be true because if you compare that to df you see indeed the first one is set curry so for that one it's true so if we now use this oh sorry if we now use this to subset df that's basically saying keep the observations in the f for which this filtering evaluates to true so if the corresponding as like the the corresponding element in the series is true then keep it if the corresponding element in the series is false then drop it so in the case of set curry what will happen oh there's an unmatched it should be like this um what do we see we only have shooters set gary anymore if we change this by russell westbrook russell with two else well we see um we only keep uh the shots by russell westbrook so that's how uh that uh comes together and then in the last thing you have to do is basically store that result into a new data frame df player which is then basically some sort of a trimmed down version it's like a selection of the data frame um and that's something you can then use in europe but it would be perfectly possible to also change this to a tray like seth gary for example and then of course we get a different picture right that's the nice thing about programming you only have to change one thing you don't have to recalculate at all uh we just get another computer profile so to say shooter passport if you will and again you can share this publish this and feature this on your profile page all right so that was the first uh bit of uh like data work data camp workspace magic i wanted to show you now for the next part i'm just going to drink glass of water first all right all right so this was your um like here we created a workspace from the nba shooting data from a dataset template but of course uh you're not limited to this vast library of templates out there like out there in the data cam product there's also tons of interesting stuff on github for you to explore right so rather than having you having to go through github download all the files there import them into data cam workspace actually try to make it super easy for you to start a new workspace from a github repository to that end i created a github repository called stocks encoded that contains already like the first steps of an analysis on how some stocks have been affected have been impacted by the covet pandemic so rather than me now having to for example download this as a zip file um what i can do is basically just copy the basically the the profile name and the name of the repository from github i'm going to close this off so i keep things clean and i can just from my workspace dashboard click on the top repository i can do here i can paste it in um i'll leave this open for a while so that you can follow along so it's basically f-i-l-i-p-s-e-h slash stocks and covets maybe semi you can drop that in the uh in the chat so that people can copy this over easily you don't have to type this all then i can choose on which technology i want to open it it's a bit similar to the technology you could choose when you were opening it from a template in this case i will again work in python that means that this github repository will be opened up as a workspace inside the jupyter lab ide for me to start exploring so it's file id sch slash stocks and kovitz using uh these dashes here all right so click on create now and again as usual as before a new workspace is created this time the workspace is populated using files from github again the jupiter lab loading icon and what do we see we see notebook. and indeed so this is something i've already prepared so in this notebook we'll see how the stock of some large corporations performed when the pandemic swept the world to accomplish this we'll use the yahoo finance data and some covet cases data made available by the new york times so this is the start so this is already a typo i should fix so i already prepared some code that uh pip installs y finance package so that's the yahoo finance package that allows us to easily get uh finance data from the yahoo api as you notice i also added percentage signs and capture here that's basically make sure that any output that the cell generates is not shown in the notebook otherwise i get like a long list of uh of installation logs uh this is something that's not interesting for my analysis so that's why i use the percentage percentage vector sign here uh then there's also already some some imports that are happening y finances yf band spd numpy snp um there's also some uh like configuring of my images so that i have a like good looking images and then it's already downloading stock market data for four big companies namely microsoft netflix booking and american airlines and i choose microsoft and netflix because these are like pandemic proof uh companies in a way or you would expect these companies to be pandemic proof like with the entire movement working from home this must have been good for microsoft netflix with everybody being at home binging series binging movies uh must have been good for for netflix's uh revenue on the other hand we have booking which is like a hospitality platform for uh arranging uh hotel bookings probably when with all the travel restrictions we would expect these to perform worse and american airlines i think that goes without seeing it's basically one of the biggest airlines in the world with a lot of planes being grounded empty flights um also they must have experienced quite a rough time in the past years or so a couple of years so what we do this is already in there you haven't financed download it we basically combine here the ticker symbols this is all code that is already there if you created the workspace from the github repository okay first i have to run all of this of course and while this is running i quickly want to uh highlight that there's already a solution that ipython notebook inside this workspace so this should allow you to like if you are no longer follow along following along with me you can basically have a quick peek at the solution uh so that you can catch up again so if you're if you're wondering like hey what was the step about or i missed the part you can just have a look at solution i pi and b and catch up again all right so um i already executed uh these three steps here and what i'll do as well i'll plot the i'll create a chart here okay so what do we see and this is obvious of course like booking it's like the the stocks are booking they just trade at a very high price and the stock of netflix as a at like like a fourth of that price and then microsoft even less and netflix way way less um but actually like the absolute stock price is not something that's super interesting to us right uh we are interested in how basically the uh the trends were so in that sense uh what we should do is basically normalize all of these stocks so that we can transfer like we normalize them all so that the stock value is like the minimum of every stock is zero and the maximum of every stock is one that's something we can do right so we have closes here um maybe the first thing we can do is uh have a look at this closest chart uh i'll select this i'll disable this and now the level closes so what do we see we see appendage data frame with the index the date and then we have the three tickers selected for ticker symbols a l b b k ng msft and flx so what we do is suppose we want to normalize the stock for aal which they can do like this and now we select that but what we should do now is basically do closes a l minus the minimum this is like a standard normalization like this and then we divide that by the maximum minus the minimum so the maximum minus closes aal not all minus the minimum all right so if we now plot closes or closes a l at least what do we see now all of the stocks are normalized right i can also plot this i hope so we see here now the stocks have been normalized between zero and one and we already see like when the pandemics start like crazy drop for american airlines right from one all the way to 0.2 that must be devastating right so that's like the 80 percent of the stock value lost so i can do the same thing for all of these like microsoft netflix booking um but that would be not dry as in do not repeat yourself so we'll just write a quick loop for that so i'll do four i in thickers so that's basically the thicker symbols that i defined here and i'll just replace aal i'll just call it thick for every dick in diggers i'm gonna use a l i'm gonna like swap out al by dick this like this and with this so now if i plot all of these closes what do i see now it's way easier to compare the different plots right and now we also see trends because other stocks were very low in value like you didn't see the trends in between them but now booking has been normalized and you see basically all all stocks regardless of like what sector they're in and how pandemic proof they are they all crash in much right so what would be interesting now is to basically overlay this data with covet cases ground data and see whether there is some sort of correlation between the cases counts and we'll take the cases count in the u.s as it's like the biggest economy in the world and see whether there is some sort of a correlation there right so i'll do plt.show to get rid of this excess subplot so i'm already thinking about the quality of my publication later on um all right so the first thing we want to do so there's already a csv in here called code with the covet cases so i can start from that so i'll do import so i have to change this to a code cell i can do like this change to code i'll do import pandas spd i can also do import numpy as numpy and then i will use grovitz pd read csv and then i will get the covet cases.csv i'll say that the index column i can look at this by the way and i want to set the index to the date so that later on we can easily merge this with the other uh with the other data frame that already contained the data as the index as well so we say index equals date and then i'm only interested in the cases columns i'm not interested in the deaths columns we use the columns we only import the date column and we only import the cases column all right like this oh something wrong uh it should not be indexed it should be index call all right like this so what we did here we read the csv from pandas covetcases.csv um we said that to use the date and the cases columns so to ignore the depths column and then to use the date column as the index of the pandas data frame if we now also set the um so this is still let me check if i do kovitz d type then i see that uh let's see what the index is like um so what we see is that the the type of the index is still an object but the thing is it's actually a date so we have to fix this to make sure that the index is actually a date time so we do two dates time covets dot index right like this and if we now check copy.index and uh we check the type then we see that it's this time it's a date time right great so if we look at go for that what we see indeed basically for every date there is now a case count great so if we try to plot this data what do we see that actually these coveted case currents they're cumulative counts it's basically what new york times did in their uh data set is just it's just like a counter that went up and up and up so the number here is basically all the deaths before that date in october the thing is we are interested in daily case guns so in a way we have to un-accumulate the cumulative sun that happened here right and this is a bit tricky i'm going to copy over some codes that i already uh created and i'm just going to walk you through it okay so uh i just copied it over here so what the first thing i do i take the cases column and i convert it into a numpy array so i can do easy easy mathematical operations with it then to go from accumulator sum i basically create a cumulative sum shifted so i actually shift the entire array one element to the left and then i do the constant minus the constant shift that that way we only keep the deltas right so if we look at this uncomfortable we see that now it's containing the case every day so you see here one one one that's one zero zero and then from the 23rd to 24th it becomes two cases that means that that day there was one extra case right then the day after there were three cases that means that day it was one extra case right so what we did here with this nifty stuff is basically unaccumulative summing the yellow the unconsuming the non like the decumulation of the sun and then added that to the frame again so then i'll overwrite the cases count with the uncompson so that we now have the actual cases ground so if i uh look again at that what we see now is that it's really the cases every day and then what i'll also do is so maybe let's just have a look at it now and what do we see well i don't get why there's minuses here okay yeah so um if you have to run this multiple times that's because i'm doing an override of cases here and then i should actually uh yeah it would be better if i called this case is uncome summer or something it's not actually good practice to overwrite uh the column of the data frame with another column but what do we see here and that's something you might remember from all of these corona stats is that it's very like it's very see-saw like thing because there's a lot of little measurements happening in the weekends and lots of measurements happening during weekdays so to kind of smoothen this out what we'll do is we'll use a very nifty function that works out of the box very well because we have set the index to the dates correctly by using just rolling averages with a window of seven so we basically say every day slide up a window by step one and just calculate the mean over still over the seven observations in here right so if we do that and if we do covet cases rolling plot what do we see oh yeah again i have to re-run this from the top something i should have done better in my code so then in this case it's way less see saw like uh all over the place right so this is like the seven day rolling averages that also like in the news you're talking about uh constantly then yeah i can i can look at the result now this all right so now i have like cases but the means on a rolling basis and i have these stocks so the last step for me is basically combine the two right so the first thing i'll do is i'll create a new axis object in matplotlib and i'll plot all of the tickers right and the tickers like the why is it tickers so that's uh that i only show the uh like the these are basically all of the all of the different series that i want to plot on my on my plate and then because i set the index to be the date and the plot function of matpot of pandas immediately understands that you want to plot this against the date on the x-axis then uh what i'll use is the twin x function which allows you to basically show the the data for one um like to show you the data for one type of data for example here it starts data on the left hand like on the left side on the left axis and to show the cases count which is a totally different scale it's not it's not between zero and one it's rather between like zero and i don't know how much like yeah i got draw i i lost the image but you'll see it in a second you basically create two like a shared x-axis a shared date axis but then a y-axis on the left-hand side and a y-axis on the right-hand side so i'll do twin x and again i'll store this in an axis object so nothing is showing and we see already here like there's already these two y-axis showing right um then we will use covet dot plot and then we say cases rolling on the y-axis and c uh the color is black and then x equals x1 so that we plot this on the same image now we'll show this so that we don't see the nasty print on it and what do we see indeed on the left-hand side we see the normalized stock on the right-hand side we see the cases counts and like the daily cases so at the at the worst moment it was like 250 000 cases every day and that was at the beginning of this year which is pretty scary so if we then start interpreting this what do we see that there's a huge drop but actually that the huge drop in stock value already happened even before there were a lot of cases in the u.s so it was really anticipation by investors that like gave them a scare and what we also see interestingly is that when the pandemic that is at its heights like there's crazy amounts of cases of course like these first bumps they're like underrepresented because there wasn't enough testing equipment etc so it's possible that these bumps were even way higher but what you see is that even though like pandemic is in full effect in some places like in in the us that a lot of stocks like this for example the orange one is the netflix stock but this rebounded like crazy already um what you saw is that even like the the booking stock for like a travel company the the stock is is higher than it was at the start of the pandemic is higher than it was before the pandemic even when um corona is like in full effect and what you also see is that for example microsoft like they started load they had a slight dip but then it's all the way up from there so they really uh proved themselves to be like a pandemic-proof company with all the internet services that they provide so you could say like i can add a small analysis here uh this is not codes it's not super important so you can say indeed interestingly the sharp decline of value already happened way before the number of cases in the united states started to skyrocket by dynamically sweeping across the country some stocks like microsoft already rebounded to level scene before codes so you can do a small analysis here uh we can clean this maybe up a little bit or is this fine like this i actually think this looks pretty good um i can maybe uh let's plot all closing prices maybe normalize closing prices over time um overall like let me maybe just run this top to bottom again so i can do run all cells run selected cell in all below so that's basically running it again from top to bottom which is also something that my publication will do for me but i just want to be sure that this all works out of the box and the publication won't cause any problems so we see indeed like this all looks good so again i can click on share i can select which notebook i want to publish in this case i suggest you select notebook that ipine b because that contains the work you did and not the work that i prepared for you before and then i can click on publish again wait for a second because it's running from top to bottom and then in a couple of seconds this workspace should be published and then again it will be automatically enable like the public access will be automatically enabled for the publication and this will be not automatically featured on my profile page so i can check this out what do i see i see a good looking publication about stocks and kovitz i can also see on my profile page that now there's yet another data science project of mine namely stocks and covert with a nice picture is also featured on my profile page now as such i'm basically building up a data science portfolio right like this is something i could send to my employer like to a potential future employer and say okay i'm actually pretty solid in data science check out all of these projects of mine this is something you can perfectly do with data camp workspace all right so this was the second example i wanted to walk you through on analyzing stocks and covets um i'm going to take a quick break to have a drink of water and then i'll answer some questions before we move on to the third part of this code all right so there's a first question by chang she asks could you read or he asks i'm not sure um um they ask could you repeat why percentage percentage and the exclamation mark were used so what the percentage does i'll start by explaining that that's basically doing bash commands so basically doing terminal commands inside your jupiter notebook to show this with an example so i can for example i can run in the terminal i can run for example ls minus la for listing all files and that lists all files in my current working directory right the thing is you can also do this from inside jupiter lab i can do exclamation mark ls minus la and i just get this exact same output and the thing is normally if you're on a normal computer and you want to install some python packages typically you go to your terminal and you say i want to pip install wi-fi nets right i would say in this case would say like yeah it's all there like it's all uh it was already installed no worries so the thing is rather than having to go through your terminal you can just do this from inside your jupiter notebook um so that everything like basically your entire uh data data analysis is self-contained and you don't have to include instructions like hey go to your terminal do this and this and this and this to install all of these packages right um the other thing is the the other thing that cheng asked were these percentage percentage signs so let's me let me just remove this and then see what happens so i do shift enter and now there's all of these outputs that i also saw in the terminal they're all being included in here but actually it's highly uninteresting it's something i don't want to include in my report right so what i do is i use it's they call the magic commands um a magic keyword the percentage percentage capture is basically same i'm just going to swallow up any output that the command the the commands in the cell generates and in this case it's all happening behind the scenes without any any out it's of course happening but you don't just don't see the output and that way i can keep my report clean i'm actually also doing this here because if you don't do it you'll see that there's like a loader in here and it doesn't serve any value to me so i just say capture but don't show this output but still of course the data object is there all right so i hope that answers uh jane's question um then there's a question by somebody else i'm not sure what the name is it seems to be initials only bk um for such timeline plots does the date need to be indexed always so i'm not sure like pandas is very good with dealing with uh timing like time series data um i do know that if you set the index of a pandas data frame to be the date against which like you're you're you're sure like you're working with other data it just makes things super super super simple for you um and pandas basically just knows what you knows what you knows what you mean um if you set the index properly i'm sure there's also ways to do it without having the date be the index but then everything will be definitely harder for you to do then uh pk also asks can uncom some series have negative values so normally the uncompson should not have any negative values because that's against against the concept of like the cumulative sum like by definition it's it's a sum like in this case it's coveted case count so you can't have negative cases you can't un mark somebody so this uncomfortable array will contain only positive values if all as well and we can also check that but let's maybe have a look um i'll go like this this unconsent that's like this numpy array here and i can do um np any of the income sum is any of the outcomes from smaller than zero yes it is let me see there's one in here that's indeed true that this funky this must be a mistake this is interesting i'm not sure what happened here maybe it was a correction of sorts where um the like the authorities like counted some some cases wrongly or something this is interesting uh this shouldn't happen so this is a classical case of data quality problems so it's a great question by pk i don't know what's going on here by by definition the that should only go up right so the unconstrained should always be positive but yeah i don't know what happened here this is something i didn't i realize happened um then there is a question by yi um to talk for a second about number operations that need to access the data in the previous record like mp and certain np deletes um so yeah i can maybe show an example but i won't spend too much time on this because it's it's a bit technical i'm not sure if it uh it really serves the serves to go along here um but let me just show it an example so i will create a numpy array that is 1 1 3 5 5 8 9 like this r 8 12 like this and this is basically the cumulative but let's say that this is the accumulative sum right the next thing we do we ca we create accumulate of some uh shifted like this i'll print some kind of some of this and then come some shifted and what do we see we basically made the um we removed the last value and we inserted a zero so basically shifted the entire thing up by one to the right hand side right so all of this moved over a zero was added on the left hand side and the twelfth was dropped on the other side so what if i now do a book where my i think i lost not here if i now do come some a pairwise um calculation that comes and minus becomes unshifted what am i doing i'm doing one minus zero one minus one three minus one and as such basically my uh my daily case counts are appearing again right and then i see indeed one one means day one there was a one but here there was only a zero like no cases added then 23 that means that here two cases were added then goes to five means another two cases and it stays at five again no cases for added then it goes to eight three cases were added then goes to twelve four cases are added so it's just um by moving it up and then subtracting it you're basically getting rid of the of the suns in here right so i hope that answers your question for heath all right so quick recap what we did so far we created a workspace from a workspace template here a dataset templates we also created a workspace from a github repository through this functionality and in both cases from that workspace we created a publication and featured it on our datacam profile page which is something i hope all of you also did the last thing i want to show is a feature that we added recently which is called integrations and integrations allow you to securely store secrets in your data camp workspace so that you can access databases like for example a postgresql database that is living somewhere else on the internet not in your daily workspace so um to show that off i prepared yet another github repository that again we will start from um over here and it's called dvd rentals all right so here we have a public repository that all of you should be able to access again so filip sch is my github handle and then the repository name is dvd dash rentals so again there's a uh ipine b in there a solution of type in b as well that you can check at to understand what's going on if you maybe missed a couple of steps uh just read me and there could ignore everything so as usual let's get started by going to my workspace dashboard clicking github repository and then using philip sch dvd rentals all right so it's saying that this is about gita prepo and again i'm going to open this up in python i think we're hitting the limits of what is possible with github unfortunately so this is a feature we added recently and we hit the github rate limits give me a second to check i'm going to try it once more yeah so the thing is github doesn't like because everybody's like eagerly following along we're basically hitting the limits of github because it's constantly like our data cam services asking github like what are the contents of this github repository and because of that um that github is saying like there's too many requests coming from data camp and they're basically disallowing it so i will do instead this is not too bad but we'll basically have to do it somewhat less nicely we can do here is download the zip it's just like that then i can create a new python workspace like this just at the top over there like so i can just drag the zip file in here i hope i can then expand this i'm not sure if i can easily unzip this let me open up the terminal unzip dvd rentals dots zip dvd rentals master.zip all right perfect so this is what you can do right i'm also going to share this in the q a dock so that sena can share this with the others so basically the steps are basically download the zip from github open up a new python workspace and then upload the zip file into the github into the workspace and then unzip dvd rentals using the terminal commands so uh now it's inside this folder here which is fine but i'm just going to move this to here so i'm just going to get rid of this notebook here because that's like the default notebook and then i'm just going to move this all here so now i have basically the same result but as you can as you saw this was taking away uh way longer good so now i can open up notebook i find right i'm gonna wait for a second so that senna can share these uh instructions and you can get things set up so again repeating you can go to to get the repository click on codes click on download zip and then you can head over to your data camp workspace like to workspaces here you can click on new python workspace that will open up a new python workspace and then you can so then you land in here i'll also call this dvd rental so that i can use it for later on dvd rentals and then you can upload the zip file and then you can use the unzip commands inside the terminal dvd rentals.zip right i'll wait a couple of seconds longer so that people can follow along here good i hope everybody managed to catch up so um the result should be that you can open up a notebook you know you can try to clean it up it's not important the most important thing is that you have a notebook that i buy in b with all of this um like basically with all of these uh all the sample code that i prepared from the github like in the github repository so um what it's saying it's about analyzing movie rentals right so in this short notebook we'll connect to a postgresql database with some sample data around the dvd rental business after which will visualize the data in python so in this notebook we will access data in a postgresql database that is on google cloud platform which is a bit like the aws of google and we will do this all from inside our workspace i also included a structure of the database here so we see here that there's rental inventory film film category category so there's all all kinds of databases inside this uh all kinds of tables inside this database as a first step let's create a sql alchemy engine the database strings construct environment variables that were workspace integration i'll talk more about that later so typically how you do this you say from sql alchemy which is a common python package to like connect to postgresql databases i'm going to import create engine i'm going to use pandas i'm also going to use import os i'm going to create a database string and there i'm going to say is filter sql database certain user a certain password certain host certain database name and i would have to write all of these things in clear text here i would say my user is such and such my password is such and such my host is this ip yada yada this is not the correct ip by the way but this is actually very insecure because if you then share this report with a friend or with a colleague you're basically exposing your username and your password to this like to your friend or to your colleague you're sharing it with so this is definitely bad practice what we did to fix that is basically the concept of a data camp integration and that's something you can access from the left hand side here i'm quickly going to check the qa doc to see if there's any questions left no okay i hope a lot of people manage to follow along um so instead of hard-coding all of these things inside my database string i will use the concept of an integration and i can basically create a new set of environment variables these environment variables once i've created them i can basically load them into my session and then use these environment variables to create my database stream right so uh cena will share um the names and values that you have to use to follow along here so the database host in this case is an id like this then there's the database name so that's the name of the database we want to connect to that's dvd rental then there's database username that is learner in this case and then there's database passwords and that's datacamp so that's basically those are the credentials like the details for you to access this postgresql database connection i'm going to give this a name i'll call it dvd rentals 2 because i already experimented with it before so i'm now going to create this great it's immediately giving me some code that i can use so i'll copy this to the clipboard then i'm going to go next and it's asking do you want to connect this integration to your workspace this will like we need to restart your workspace session for this yes i want to do this so i'll say connect right now my workspace is restarting and it's basically making sure that these environment variables are now available inside my python session so i can check this by opening up the terminal and clicking m and seeing whether there's any database commands in there and what i see indeed there's database commands database credentials in there and indeed my environments contains db password tb username db host db name so basically the things that i just specified inside the integration step so rather than hard coding all things in here i'm just going to use the code that i copied from the clipboard and instead of using db host here i'm just going to say dbhost in here i'll do like this because i can use format strings because i started my string with an f then the name of the database is i replace with this the user name is this and the password is this and again i have to wrap it inside curly brackets if you can't nail down the syntax don't worry so you can just open up the solution of the ipad b and uh and try it out i can and then copy it from there if you're not entirely comfortable with all of this so let's have a look first at this database string oh something's wrong clearly ah i still have to uh wrap this inside curly brackets so that my string formatting works out so this is basically the connection string that sql alchemy expects to create a database engine so this is perfect i'm going to uncomment this great and let me just quickly try out whether i can just execute a random sql query so it's very simple one we'll just count the number of films let's say right we'll count the number of films that are in the database of this dvd rental business and what do we see um we see indeed that it can counted so there's exactly 1 000 movies inside the dvd rentals database which is a dummy database that's maybe why the number is so like it's a round number so this works perfectly fine so what i did right now just for your understanding i was inside data camp workspace and from inside data camp workspace i basically i basically uh connected to a postgraduate database that is living inside a google cloud platform data center and i'm just accessing that from inside data cap workspace without having to leave my data camp notebook like my uh my ipython notebook experience all right so this is fairly interesting i'll say just counting the number of films but let's say we just want to see how often each film category was rented out so this would require us to get like the rentals table because then we know when there were rentals happening and then we would have to link it up to category like through the inventory to the film through the film category into the category we would have to basically do a bunch of joins so that we can combine every rental with a certain film category all right so let's do that so do like this um i'm just going to do a select star here for now so i'm using triple a string so that i can basically chop up my python string into multiple lines which is way more readable if you're writing a lot of binding codes so i'll do select star from rentals in the first place a rental and i'll interjoin that with inventory so this is sql code i'm writing right so i'm writing sql code inside a python stream inventory id so now we're linked up to inventory we'll also inner join us with film using film id those are the that's the like the the foreign key here inner join film using film id i don't use the commas here then i interjoin from film to film category using film id no so i'm i'm wrong here this should be through inventory id apologies an inventory should be true rental id or wait if entry id yes and the film id and then i use again film id it's like this enter join film category using category using category id all right and then what am i interested basically inside the category table i'm interested inside i'm interested to the category name so the category name as category name i'll also do maybe later on it can be useful film title uh i'll put that in the front because it's like the first thing you want i guess um so title so it's film dot title as film title category dot name ask category name and then also let's let's get to read all dates so we have some some information right great so i did a bunch of joins and now i see that for example the film titled freaky focus was rented out in 2005 since it's a pretty old data set and the category name was music all right so the last thing i can do now and it's the nice thing because i have this data as a pandas data frame in python i can very easily um print this like i can do something now with this i can basically uh for example calculate an aggregate um or uh make a plot right so let's do that so i do a df i group by the category name and i'll count to size what we see there's indeed there's 1112 action rentals 1466 animation rentals uh 945 children rentals for different categories or different rentals again to clean this up i'm going to reset the index here and i'm going to use basically because this size this created a series but i want to turn the series into a into a data frame again so if i turn it into a data frame again um we see that the category name um this is now no longer and this is next now and then let's already try to create a plot of this rather than using matplotlib i suggest we use plotly for once that's also a package that is by default available inside datic and workspace you don't have to install it yourself anymore we'll create a bar plot of uh let's call this aggregates this data frame like this create a bar plots of aggregates um x is the category name y is different all right what does this look like wait for a second exciting all right great all right so we see a bar plot here but i don't really like the way this barcode looks because it's hard to compare like for example sports and animation who is the winner here like they're very they're way too far apart right but i can do here to add to my aggregates i can sort the values here by for example count in this case and in this case this is way nicer so i do see that animation just does not win out from sports that music is the least popular category that's been rented out uh foreign is like somewhere in the middle animations works here so this was just to showcase a bit how you can go from a workspace inside python just i should try to mark that from a workspace completely written in python but access postgresql data from inside this workspace again if i'm publishing this i'm not exposing any secret information because that secret information has been basically injected in a way into my python session but it's not being hard-coded into my python code here so this is way way better than hard-coding usernames and passwords right into your python notebook so again as usual uh final step is basically to share this i want to publish this with the world because i'm super proud of the work i did i have to wait for a second for it to run from top to bottom but it's that's the important thing so it takes a while to run but that is because it runs from top to bottom so it really verifies that every code cell one after the other works that also make sure that all the code you wrote works in the order that it appears inside the notebook if i then open this up i again see a good looking publication about analyzing movie rentals with the python code again no secret information is being leaked here i am connecting here i'm checking the data frame i'm doing an aggregate i'm doing a plot and i have a good looking plot to end with again if i check my profile page what do i see i see the dvd rentals data yeah it's showing the first picture of the publication by default i don't really like that so maybe let me just get rid of the structure this is also not very interesting for the reader let me just get rid of that this i hope this should go up and then this i can move in here and then this and then i'll share update again let's see what this gives that's still running yes now it's finished good and then if i check my profile the dvd rentals data yeah that's because plotly is a html library so it's uh the thumbnail cannot be easily shown here that's uh that's actually something we should improve on so that also plotly thumbnails can show up on your data can profile because this looks a bit boring to be honest um all right so that was it now we have five minutes left for some final q a if there's people still with questions and there are some questions um yeah so i made a mistake um so pk says she's getting an or he or she is getting an error i'm trying to unzip that's because i made a mistake it's dvd rentals master.zip and not dvd rental.zip then chain asked how do i know the specific value of the ip address of this database do i need to know it in my specific coding so yes indeed this is something that you have to know yourself so say for example you work at a company that has postgresql databases that you can analyze then you should go to the system admin and ask him or ask her ask them what is the ip address what is the url basically where is this database hosted so that i can access it so for this example um dvd rentals was contained inside like dvd rentals was at the ib that is shown here but for other databases they will be in different places right so this is the ip address that i read off from my google cloud platform console uh so google cloud told me this is where your database is running this is how you can connect to it so this is the ip address you should use then um upd is asking a question the integration feature is an excellent function is it secure enough to store credentials from another server and is there any encryption involved so that's a great question actually um so yes um it's very secure uh so we use um an external service called um hashicorp vault which is known to be like the best in terms of security and encryption of secrets so rather than us storing these secrets that you put in here ourselves um we basically depend on a very professional company whose main business is storing secrets as safely and securely as possible so every secret you store here is being encrypted in transit on the disk and only decrypted when you need it so it's entirely secure and then there's a question from pk um how much data can the notebook handle so i'm not sure exactly what you mean so um there's different ways to answer this question i would say that a workspace you can put three gigabytes of data inside a workspace um you can go up to five gigabytes uh but if you go over five gigabytes of data that you store inside one data camp workspace then um you lose access to your workspace of course we can still get you the workspace data afterwards but we for now set some sort of a fair limit use on three gigabytes of data um then in terms of how much data can the node handle can also be in terms of processing power in terms of working memory so in terms of the ram workspace gets so you get four gigabytes of ram inside data cam workspace um depending on how complicated the algorithms are that you do this becomes a problem sooner or later but for like the basic analyses we did for for example accessing postgresql data this is not uh like this shouldn't be a problem the ram shouldn't be the problem if you're going to do crazy ensemble methods or you're going to do hyper hyper parameter tuning for different random forests that's when that's moments when you could potentially hit the limit if you do hit the limits definitely let us know through the feedback mode over here let us know like hey i'm hitting the limit for my analyses and that's good information for us to at some point also increase the resources we give every person on data camp workspace so i also hope that that answers pk's question are there any other questions all right cool um senna lets me know that there's no additional questions so i think we can um leave it at this right on time only 90 seconds left before it's uh 6 30 here in belgium i want to thank you all for attending um slight hiccup at the end with the dvd rentals data probably in an hour you'll be able to start the workspace from the github repository again um but uh thanks for staying with us um thanks for tuning in and i hope to see you soon in another quarter long session where i'll hopefully be able to showcase more nice data cap workspace features that will help you basically do data science in the blink of an eye without having to configure or install anything on your own system thanks goodbye thank you phillip uh just some closing messages here um we'll be organizing more of these presentations in the future um so stay tuned and please let us know if you have any additional feedback regarding um these sessions we will post the survey link in the chat but also on slack along with the recording so your feedback will really make a difference in helping improve the future events that we hold and improve workspace as a whole and finally if you haven't already please feel free to join the data camp global slack community where we will keep you updated about new workspace features and events that are happening so this rounds up the fourth workspace live code along session i remind you that the session was recorded and the link to the recording will be posted later on slack email and social media so with that being said uh have a great rest of your day and thank you for joininghello to everyone um first of all thank you for attending the fourth workspace live code along a huge thank you as well to the speaker phillip shominars who together with datacamp and especially all of you made this event possible once again i'll remind you the session is recorded and the link to the video will be posted through slack email and social media platforms moving to the next slide a little bit about data camp itself so our mission statement is to democratize data science education and make data literacy accessible to millions of people and businesses around the world through our learning platforms and now also through our data camp workspace certification and events such as this live code along moving to the next slide talking about our speaker today you may have seen philip before in some interactive courses here at datacamp but now he is the product manager of datacamp workspace today he will be more than happy to explain how you can start your own data analysis with just a few clicks using workspace today's live code along is divided into several parts first philip will show you how you can start your workspace from a template and add it to your portfolio next philip will start a workspace from a github repository and explain how you can add this to your portfolio as well and finally he will connect the workspace to a relational database and build a data report every part will take about 30 minutes and will have its own q a section if you have any questions feel free to ask them via the questions button on the control panel of the go to webinar philip will then try his best to answer them during the corresponding q a session finally we want to remind everyone that this session is being recorded and the link will be shared with you later on slack email and social media platforms thank you for your attention and now the live code along itself will begin philip over to you hello can you hear me right just checking yes loud and clear okay perfect all right hi all um it's really cool to see that uh you've attempted to such numerous i like so numerously to this live code log the fourth one on data count workspace it's going to be a little bit different from the code logs we did before so as i'm doing different data analyses that you'll be able to follow along and code along yourself i will also be showing some of the latest features we've been had we've been adding to data camp workspace so in the first case so in case you've never tried your hand the data came workspace yet basically what data came workspace is it's as easy as taking a couple of courses on data camp but we made learning very easy make it possible in a couple of seconds couple of clicks you can get started learning data science but workspace it's the same but for your own data analysis projects so you can get started with your own data science project very easy with a couple of clicks no installation no configuration required so if you haven't uh explored like if you haven't worked with your workspace yet this is what you'll see if you click on the workspace tab once when you're logged in it's important to note that for people who are in a b2b group or b2b account they don't necessarily have access to data camp workspace for these people i suggest that they quickly create a personal account on their personal email so that they can still get access to data camp workspace so once you click on get started for free basically land in your workspace dashboards so this is my workspace dashboard of course there's tons and tons of workspaces on here because i'm working day in day out on making data cam workspace the place where you can come to do your data science projects but in your case probably there'll be less workspaces here so basically every dial here is a data cam workspace so in the first part of this code log what i want to do is basically walk you through the steps necessary to basically do your own data science project with an interesting data set that we have prepared for you so suppose you already took a couple of courses on data camp you took a couple of data gun projects and you're basically ready to take the next step and basically say like i want to do a free form project where nobody is telling me what to do i just want an interesting data set to analyze i want to analyze it and i basically want to share my work with the worlds for that we have built uh data camp workspace templates that you can see here on the left hand side so you see data sets recipes and playbook so let me basically look at all templates first and we see a bunch of templates so all kinds of different templates there's data set templates to really get you started with a data science project and starting with a data set there's recipes that give you tips on how you can solve typical common problems that you basically have to common things you'd have to do in the data science project and then there's also playbooks which are longer form templates that really solve a specific business problem for our specific use case we will let me quickly check if there are already any questions okay for our use case we will basically start a workspace from a dataset template so there's a bunch of stuff i can choose from here like i can analyze mediazoles data i can analyze loan data but in my case let's analyze some nba shooting data so i click on this and i can basically preview this template to see if this is a template i'm interested in exploring so it's apparently a data set based on statistics to be found on the mba websites about basketball um i see that the data frame contains a couple of columns like shooter x y range defender score let's say i find this interesting and let's use this template also i'm i'll be using the python programming language throughout this template you're free to use r but then i can't guarantee that you'll be able to follow along as easily because you'll constantly have to translate it from python to r so i uh strongly recommend that you also use python here so i just clicked on use templates and what this will do this will launch a data camp workspace for me uh without me having to configure anything so this launch jupiter lab ide which is a very commonly used integrated development environment for data science and all the files necessary for me to get started with my analyses are basically available here so i see the same things that i saw in the template preview before and dc is the data set based on statistics can be found on the nba website uh there's already some interesting like some useful code for me that imports the data from the csv that's also contained inside this uh data camp workspace and there's even a printout so let me already run this so this is the first so i use shift enter to execute the code in these cells you can also always use the play button here at the top all right so that's that um then i'm going to import the csv and i see that the first five rows are indeed there's like the shooter so that's the name of the basketball player that did some shots x and y that's then the position from where this um the shooter was uh like basically was taking the shot at the hoop so it's the horizontal distance of the shot taken from the baskets in feet and the y is the vertical distance okay interesting all right what do i see else like who was the defender so who was trying to basically field off the player's shot and then there's a score whether or not the shot has been made or the shot has been missed so that i also see in the uh some sample code that was already included see indeed that the score is either made or missed all right cool i think i have a pretty good sense of this data set so like i'm going to cut a couple of cells to clean this up here like this i can also use dd for that and then i get also suggestion for some interesting questions for me to answer so um indeed i get to explore this exciting data set i can't wait i'm super excited um can't think of where to start try a couple of questions so i see a pretty cool plot here like plot shots taken and missed via scatter plots and you can look at this example for shots taken by russell westbrook to give some inspiration so i basically get a suggestion like maybe you should try to recreate this plot over here so let me do that right i'll just get rid of this and let's say it let's plot shots taken and this scatter plot like this all right so now i can start coding so the first thing i should do i guess is basically get the image of the basketball court in python and if i look here on the side i already see that there's an nba port port.jpg in there so what can i do i see that already mapleflip bypass plt is loaded in so i can basically say i can use the imread function of matplotlib just to remind you um this is actually a code long so i highly recommend you to like code long as i'm coding through this right um in a somewhat later we will also share a publication a workspace publication with all of the code that we've written so far so that like if you missed a step here or there you can still like fill it in and basically still create a publication feature on your date again profile page so i'll do like this like i'm reading out this image over here i'll also check real quick what the width is of this this image and i'll also do that for the height and let me quickly print out width and height okay so i see that this is an image of 1365 pixels by 100 like 1455 pixels this is something that will be interesting later on i can also see what the image would look like if i print it out so blt.show and i see indeed that i um so it's basically the pixels i've been converted to the to the axis units here okay cool um so next thing i would like to do is is basically um draw i'll draw all of these all of these points basically all of these different observations for a certain shooter on this image so let me start by creating a figure a maple clip figure and let's say we make it like size eight like we want to make it uh eight in width and then we also want to make it um eight in height so you do like this with times height that way we're sure we're respecting the aspect ratio that was defined by the image in the first place if i do plt to show like this i'll first add some axes that's basically drawing axes under the image this all right so this is just our blank canvas the next thing we can do is basically on these axes now that we will later on drop the scatter plots on we can now show the image like this all right great so that's that we can get rid of this now we don't no longer need this all right this is perfect and um what you also see here is whenever you create a plot i just use x.m show it will always print out like the the plotting object behind the scenes if you want to avoid that you can either use a semicolon here then the text disappears or you can also use plt.show and then it's also not happening right um what we'll do in addition so basically what we have to do now is we want to get the x coordinates and the y coordinates and basically draw them on here but we immediately see that the x coordinates are minus 3.8 5.5 while here we're dealing with a 0 to range and 1 we're dealing with the zero to 1 300 range i'm getting a request to speak a little slightly a bit slower so i'll do that and also get a request to zoom in a bit so i'll also do that i hope this is better for people to follow along but so what i was talking about i said um so basically the x-axis and the y-axis is um like is there a completely different scale than the ones in here so i'll probably have to do some sort of transformations to make sure that uh the the the dots of where the the player was when he when he threw the ball that that still works out so let's see um let's just very stupidly maybe uh just try to plot the x and y coordinates without making any changes so we'll do scatter and then we select from the data frame that was loaded in the x-coordinates and we select from the data frame that was loaded loaded in the y-coordinates and then um like let's give the the images like let's give the the points a size of three and then we use the color uh no but i'll wait with that um wait that i'll do like this and what do we see yeah so as expected what do we see like it's all in here at the top like the dots are all here so there's clearly some scaling that needs to be done so what we can do to scale is basically um if you know that an nba court like the width of an nba court is 50 feet and our image is like 1 450 pixels wide roughly what we can do is basically do create an x rescale or create a actually skilled like this and say that we will do divided by width times uh we will divide it by width no we will divide it by 50 because that's the width of the field and then we will multiply it with width over here and we'll do the same thing with y pre-scaled df y and we divide it in this case uh what we're seeing is basically half the length of a basketball um so that's 47 feet so we can divide it by 47. that's just information i know right like that the width of a basketball field is 50 and the height like the length of a basketball court is like half of the length is 47. so let's now uh print rescaled like this and rescaled like that and this is already somewhat better but there's still some work to do right we clearly can distinguish the circles of our blood over here but it seems like we have to flip it around like the y-axis we have to flip it around and the excess axis that's still skewed and that makes sense because the x-axis here you see it's minus 3 minus 0.1 minus 0.5 so it's counted from like considering this to be the center rather than the bottom left like the top left screen being the center so what we can do here is we basically add uh 0.5 to this like that way we normalize it to this is already much better like the x-axis seems to be correct now but then the y-axis what we need to do we have to flip it around so what we can do here is one minus divided by 74. this is already somewhat better but not yet perfect that's because i made a mistake here i should do like this right it's one minus and then dfy divided by 47 times the height all right so this is already much better but we do see that there's slightly like it's slightly skewed to the left-hand side so that's probably something that has to do with our image so it seems like we can add a couple of maybe like 0.55 or something and now we see that the shots are cleanly nah maybe three five something like this yeah this is much better so we see here clearly that now there's a lot of shots taken outside of like the the three pointer range this is actually a pretty good depiction but of course like if we compare this to the image over here we still get into filtering based on who is the player and we still haven't given the given the points different color depending on whether the shot was made or the shot was missed right so maybe before i do that i also want to clean up a bit so um somebody are already asked like can you talk a bit about the ad axis so what you do in matlab clip you basically start from a canvas i create a figure now you add the axes and what you're seeing here is that the origin should be zero and here the origin should be zero as well and the axis should have ticks um of step one but that's something we can talk about a little more later on something you can also check on the documentation so uh what i'll do first is i'll set the ticks to be like this so that we no longer see these annoying numbers on the side because they don't add a lot of value and i'll do the same for the white text as well so now like the numbers are gone here and by the way every time that you see me executing a cell i'm basically using shift and enter as a short key it's just the same as clicking on the play button here then um something else i can do because we see like a double a double rectangle here so what i'll do is i'll just set off like i use plt axis and i'll just use off like this all right so this is now clean no double lines or something this is something i can get started like this is something i can continue on um the first thing we'll do is make sure there's some nice colors in there right so we can use basically what we want to do i'll open up in a new in a new code cell i basically want to give every point a color depending on whether the shot was hit or the shot was missed so that means that we have to go from the score column to a new column let's call the column color so that i can immediately use this inside my macbook function so what i'll do here is i'll use df player score and then i'll use the map function and i'll say mate should be converted to green so that made shots will appear in green later on and then missed i should convert that into red all right so df player is not defined it's df of course so what we see indeed that made shots have been converted to green missed shots have been converted to red so this i can now basically add here and i can create a new column called color like this that i add to my data frame and what i can do now here is a color equals df color and then i can say two lists and if i do like this now i see indeed very nicely all uh the all the missed shots in red all the uh like all the made shots in green maybe the size can be somewhat larger here the s stands for size so let me make that then maybe like this it's already better again but again this is the entire data set right if we look at the dimensions of this data set shape we see at 700 776 shots from four different players right um so what i'll do now as the last step is basically do a filtering to make sure that i only select the shots by for example what was the suggestion of russell westbrook so i'll already copy that name so i don't have to do that in a second so i first have to do the filtering now so if you remember from pandas the filtering happens as you can basically generate like a series of uh booleans by using the shooter in this case and saying has to be equal equal to russell lesbroke i'll also copy that to another uh to another cell so we can experiment with that so we need to get a series that is false or true depending on whether it shoots russell westbrook now we can basically subset that same data frame like this by um surrounding it by brackets and putting the df in front can basically subset that and now we only get shots by russell westbrook and indeed we see russell westbrook is the only um the only shooter in here so that's we got and we can copy over here and i'll call this and i'll give this as the name of a new data set a new data frame df player and then i'll do like this like this like this so maybe also to make sure that my code is still fully reproducible let me start at the top i import numpy i just use shift enter to click through this entire thing and what do we see now now we see indeed only shots by russell westbrook and if we compare it to the picture that was created here indeed it's exactly the same so we see the red dots and the green dots maybe we can make it slightly less big by changing it to eight here and then we see indeed a good looking plot over here so let's say now that i'm happy with this plot that i put together i'm also going to clean this up a bit um i'm just going to say let's plot shots taken by russell westbrook and color them depending on whether or not the shot was made this all right cool this is code i'll like this i'll get rid of so suppose i'm now um super happy with the the way this code looks the next step in data camp workspace is actually it's super easy now to say i'm happy with the way this notebooks look this notebook looks i want to share this with the world right so let's do that again basically click on share and then here you can click on publish workspace you can just publish the notebook so you have to select which ipython notebook you want to create a publication from in this case there's only one ipython notebook inside the workspace so there's just no notebook.choose from i can just click on publish here it's going to load for a second it's basically re-running the workspace from top to bottom again to make sure that every code cell that is dependent on the other one like it still all works and you didn't make any mistakes in the meantime and now i have a publication i can super easily share with friends with colleagues all right so this is a publication it's clearly that it's mine so it has my name on it you can even give it an upvote you can also share it with friends and colleagues to give you an upvote on it but you also see here in the workspace publication oh what in the sharing status is that it has public access so that means that everybody who has the link to this publication will be able to access this and that also it's automatically featured the publication is automatically featured on my profile page what does that mean is basically if i go here to my data cam profile i open up my profile in a new tab what do i see i see my nba shooting data appear here so as such like super easily you're able to start from data science projects like from like the nba data set do some work on it then generate a publication from it and then feature that publication on your profile page where it's shown next to all the courses you completed all the projects you completed all the tracks you potentially completed in my case i didn't complete any tracks but i'm sure like some of you have already completed a couple of data cam tracks so rather than having to wrangle github to make sure that your notebooks are showing on there and rendered nicely basically we have tried to make that step very easy with data gap workspace being able to create a publication um very easily basically all right so i'm going to pause here for a second and have a look at the questions um so there's a question by heath she asked me to pause for a minute um so yeah i think uh senna who is conducting life code long he will share a publication that i made in the past that basically shows all the uh code that i've written so far or like roughly because i like i kind of winged it a bit um but he will share the uh the solution code with you so that you can still uh you can still review this if you couldn't follow along entirely then i should also ask the question to go through the second cell again if she means that the second cell is this pandas read undercourse csv call what this is basically doing is from the pandas package it's using pandas as being imported as pd so the read underscore csv function is called and then we point to the csv that we want to load in and the csv is here in this case so that's basically taking the data from the csv file into like loading it into python into a data frame and then with the head function we look at the first five records i hope that that answers the question then there's also a question by asha who asks whether this workspace can now be posted on github yes this is possible so um what you have here inside jupyter notebook is a terminal so inside the jupiter lab interface you can just go through the launcher and open up the terminal and in here you can basically uh like i can create uh like on github i click on to build this it shouldn't show um data camp um so on my personal repository like my personal uh profile it's perfectly possible to create a new repository here and then just add um like add the remote here um authenticate yourself and push these changes to github so this is all possible from inside data camp workspace and it's also something i can show you later on all right so maybe we have a couple of minutes left let me walk through exactly when i think that's what aisha is asking for um or like what is exactly happening here so the first thing we do is uh we create a figure uh with a certain size so it's uh with like the width in this case is four and the height in this case is four and basically we use the width times height aspect ratio like the width by height aspect ratio to make sure that it fits basically the image that we read in then we add a couple of axes with origin zero and step size 1 and 1 in both directions then the first thing we basically plot on that canvas is the image itself so the image that we read in from this jpeg then we set xstix set y text to empty arrays so that we get rid of the thick labels on the left hand side and the bottom side then we also just disable all of the axis so that we don't show the axes and then we use basically we rescale x we divide it by 50 because the width of a field is 50 feet and we add roughly 0.5 to it so that the coordinates are shifted properly and then we multiply by the width of the image again and the width of the image in this case was around 1 200 pixels and we do the same thing for y only there the transformation is slightly different in that case we basically have to flip it around around like we do one minus and then divide it by 47 and then we do that times the height so the number of pixels in the height then we create a new column color that we mapped from the column score and we basically map made to green and we map mist to red and we store that into a new column in the data frame color and then finally from um we do a filtering on the data frame where we just filter on the russell westbrook shooter and then we store that like that we create a new data frame from that and then we can use these axes that we created here to basically add a scatter plot and then we use x rescaled why we scale because otherwise it will look very weird like we have to use the rescaled values here we choose to have a scatter plot with dots that have size 10 and then we also color it in using df player color and then convert it to a list and then if we if we plot it we see this nice image all right so last question and i think then we should continue because there's still lots of things to cover um was uh daniel with a question could you elaborate on the subset function i didn't understand how you made a filter for the specific player so yeah sure so this is df so this is the data frame for all players right the first thing i do is basically understand which records or which observations in this data frame are for the shooter russell westbrook so what i do is i basically check i do like this so i look at the shooter that's so that's basically a series with only shooter column in there and then i see which ones of these are equal to russell westbrook i'll do like this and then i see false false false i can also do this with another one right i can also do this with chris paul for example with uh satgari for example and in that case you'll see a different result and you'll see that the first couple of observations for those it will be true because if you compare that to df you see indeed the first one is set curry so for that one it's true so if we now use this oh sorry if we now use this to subset df that's basically saying keep the observations in the f for which this filtering evaluates to true so if the corresponding as like the the corresponding element in the series is true then keep it if the corresponding element in the series is false then drop it so in the case of set curry what will happen oh there's an unmatched it should be like this um what do we see we only have shooters set gary anymore if we change this by russell westbrook russell with two else well we see um we only keep uh the shots by russell westbrook so that's how uh that uh comes together and then in the last thing you have to do is basically store that result into a new data frame df player which is then basically some sort of a trimmed down version it's like a selection of the data frame um and that's something you can then use in europe but it would be perfectly possible to also change this to a tray like seth gary for example and then of course we get a different picture right that's the nice thing about programming you only have to change one thing you don't have to recalculate at all uh we just get another computer profile so to say shooter passport if you will and again you can share this publish this and feature this on your profile page all right so that was the first uh bit of uh like data work data camp workspace magic i wanted to show you now for the next part i'm just going to drink glass of water first all right all right so this was your um like here we created a workspace from the nba shooting data from a dataset template but of course uh you're not limited to this vast library of templates out there like out there in the data cam product there's also tons of interesting stuff on github for you to explore right so rather than having you having to go through github download all the files there import them into data cam workspace actually try to make it super easy for you to start a new workspace from a github repository to that end i created a github repository called stocks encoded that contains already like the first steps of an analysis on how some stocks have been affected have been impacted by the covet pandemic so rather than me now having to for example download this as a zip file um what i can do is basically just copy the basically the the profile name and the name of the repository from github i'm going to close this off so i keep things clean and i can just from my workspace dashboard click on the top repository i can do here i can paste it in um i'll leave this open for a while so that you can follow along so it's basically f-i-l-i-p-s-e-h slash stocks and covets maybe semi you can drop that in the uh in the chat so that people can copy this over easily you don't have to type this all then i can choose on which technology i want to open it it's a bit similar to the technology you could choose when you were opening it from a template in this case i will again work in python that means that this github repository will be opened up as a workspace inside the jupyter lab ide for me to start exploring so it's file id sch slash stocks and kovitz using uh these dashes here all right so click on create now and again as usual as before a new workspace is created this time the workspace is populated using files from github again the jupiter lab loading icon and what do we see we see notebook. and indeed so this is something i've already prepared so in this notebook we'll see how the stock of some large corporations performed when the pandemic swept the world to accomplish this we'll use the yahoo finance data and some covet cases data made available by the new york times so this is the start so this is already a typo i should fix so i already prepared some code that uh pip installs y finance package so that's the yahoo finance package that allows us to easily get uh finance data from the yahoo api as you notice i also added percentage signs and capture here that's basically make sure that any output that the cell generates is not shown in the notebook otherwise i get like a long list of uh of installation logs uh this is something that's not interesting for my analysis so that's why i use the percentage percentage vector sign here uh then there's also already some some imports that are happening y finances yf band spd numpy snp um there's also some uh like configuring of my images so that i have a like good looking images and then it's already downloading stock market data for four big companies namely microsoft netflix booking and american airlines and i choose microsoft and netflix because these are like pandemic proof uh companies in a way or you would expect these companies to be pandemic proof like with the entire movement working from home this must have been good for microsoft netflix with everybody being at home binging series binging movies uh must have been good for for netflix's uh revenue on the other hand we have booking which is like a hospitality platform for uh arranging uh hotel bookings probably when with all the travel restrictions we would expect these to perform worse and american airlines i think that goes without seeing it's basically one of the biggest airlines in the world with a lot of planes being grounded empty flights um also they must have experienced quite a rough time in the past years or so a couple of years so what we do this is already in there you haven't financed download it we basically combine here the ticker symbols this is all code that is already there if you created the workspace from the github repository okay first i have to run all of this of course and while this is running i quickly want to uh highlight that there's already a solution that ipython notebook inside this workspace so this should allow you to like if you are no longer follow along following along with me you can basically have a quick peek at the solution uh so that you can catch up again so if you're if you're wondering like hey what was the step about or i missed the part you can just have a look at solution i pi and b and catch up again all right so um i already executed uh these three steps here and what i'll do as well i'll plot the i'll create a chart here okay so what do we see and this is obvious of course like booking it's like the the stocks are booking they just trade at a very high price and the stock of netflix as a at like like a fourth of that price and then microsoft even less and netflix way way less um but actually like the absolute stock price is not something that's super interesting to us right uh we are interested in how basically the uh the trends were so in that sense uh what we should do is basically normalize all of these stocks so that we can transfer like we normalize them all so that the stock value is like the minimum of every stock is zero and the maximum of every stock is one that's something we can do right so we have closes here um maybe the first thing we can do is uh have a look at this closest chart uh i'll select this i'll disable this and now the level closes so what do we see we see appendage data frame with the index the date and then we have the three tickers selected for ticker symbols a l b b k ng msft and flx so what we do is suppose we want to normalize the stock for aal which they can do like this and now we select that but what we should do now is basically do closes a l minus the minimum this is like a standard normalization like this and then we divide that by the maximum minus the minimum so the maximum minus closes aal not all minus the minimum all right so if we now plot closes or closes a l at least what do we see now all of the stocks are normalized right i can also plot this i hope so we see here now the stocks have been normalized between zero and one and we already see like when the pandemics start like crazy drop for american airlines right from one all the way to 0.2 that must be devastating right so that's like the 80 percent of the stock value lost so i can do the same thing for all of these like microsoft netflix booking um but that would be not dry as in do not repeat yourself so we'll just write a quick loop for that so i'll do four i in thickers so that's basically the thicker symbols that i defined here and i'll just replace aal i'll just call it thick for every dick in diggers i'm gonna use a l i'm gonna like swap out al by dick this like this and with this so now if i plot all of these closes what do i see now it's way easier to compare the different plots right and now we also see trends because other stocks were very low in value like you didn't see the trends in between them but now booking has been normalized and you see basically all all stocks regardless of like what sector they're in and how pandemic proof they are they all crash in much right so what would be interesting now is to basically overlay this data with covet cases ground data and see whether there is some sort of correlation between the cases counts and we'll take the cases count in the u.s as it's like the biggest economy in the world and see whether there is some sort of a correlation there right so i'll do plt.show to get rid of this excess subplot so i'm already thinking about the quality of my publication later on um all right so the first thing we want to do so there's already a csv in here called code with the covet cases so i can start from that so i'll do import so i have to change this to a code cell i can do like this change to code i'll do import pandas spd i can also do import numpy as numpy and then i will use grovitz pd read csv and then i will get the covet cases.csv i'll say that the index column i can look at this by the way and i want to set the index to the date so that later on we can easily merge this with the other uh with the other data frame that already contained the data as the index as well so we say index equals date and then i'm only interested in the cases columns i'm not interested in the deaths columns we use the columns we only import the date column and we only import the cases column all right like this oh something wrong uh it should not be indexed it should be index call all right like this so what we did here we read the csv from pandas covetcases.csv um we said that to use the date and the cases columns so to ignore the depths column and then to use the date column as the index of the pandas data frame if we now also set the um so this is still let me check if i do kovitz d type then i see that uh let's see what the index is like um so what we see is that the the type of the index is still an object but the thing is it's actually a date so we have to fix this to make sure that the index is actually a date time so we do two dates time covets dot index right like this and if we now check copy.index and uh we check the type then we see that it's this time it's a date time right great so if we look at go for that what we see indeed basically for every date there is now a case count great so if we try to plot this data what do we see that actually these coveted case currents they're cumulative counts it's basically what new york times did in their uh data set is just it's just like a counter that went up and up and up so the number here is basically all the deaths before that date in october the thing is we are interested in daily case guns so in a way we have to un-accumulate the cumulative sun that happened here right and this is a bit tricky i'm going to copy over some codes that i already uh created and i'm just going to walk you through it okay so uh i just copied it over here so what the first thing i do i take the cases column and i convert it into a numpy array so i can do easy easy mathematical operations with it then to go from accumulator sum i basically create a cumulative sum shifted so i actually shift the entire array one element to the left and then i do the constant minus the constant shift that that way we only keep the deltas right so if we look at this uncomfortable we see that now it's containing the case every day so you see here one one one that's one zero zero and then from the 23rd to 24th it becomes two cases that means that that day there was one extra case right then the day after there were three cases that means that day it was one extra case right so what we did here with this nifty stuff is basically unaccumulative summing the yellow the unconsuming the non like the decumulation of the sun and then added that to the frame again so then i'll overwrite the cases count with the uncompson so that we now have the actual cases ground so if i uh look again at that what we see now is that it's really the cases every day and then what i'll also do is so maybe let's just have a look at it now and what do we see well i don't get why there's minuses here okay yeah so um if you have to run this multiple times that's because i'm doing an override of cases here and then i should actually uh yeah it would be better if i called this case is uncome summer or something it's not actually good practice to overwrite uh the column of the data frame with another column but what do we see here and that's something you might remember from all of these corona stats is that it's very like it's very see-saw like thing because there's a lot of little measurements happening in the weekends and lots of measurements happening during weekdays so to kind of smoothen this out what we'll do is we'll use a very nifty function that works out of the box very well because we have set the index to the dates correctly by using just rolling averages with a window of seven so we basically say every day slide up a window by step one and just calculate the mean over still over the seven observations in here right so if we do that and if we do covet cases rolling plot what do we see oh yeah again i have to re-run this from the top something i should have done better in my code so then in this case it's way less see saw like uh all over the place right so this is like the seven day rolling averages that also like in the news you're talking about uh constantly then yeah i can i can look at the result now this all right so now i have like cases but the means on a rolling basis and i have these stocks so the last step for me is basically combine the two right so the first thing i'll do is i'll create a new axis object in matplotlib and i'll plot all of the tickers right and the tickers like the why is it tickers so that's uh that i only show the uh like the these are basically all of the all of the different series that i want to plot on my on my plate and then because i set the index to be the date and the plot function of matpot of pandas immediately understands that you want to plot this against the date on the x-axis then uh what i'll use is the twin x function which allows you to basically show the the data for one um like to show you the data for one type of data for example here it starts data on the left hand like on the left side on the left axis and to show the cases count which is a totally different scale it's not it's not between zero and one it's rather between like zero and i don't know how much like yeah i got draw i i lost the image but you'll see it in a second you basically create two like a shared x-axis a shared date axis but then a y-axis on the left-hand side and a y-axis on the right-hand side so i'll do twin x and again i'll store this in an axis object so nothing is showing and we see already here like there's already these two y-axis showing right um then we will use covet dot plot and then we say cases rolling on the y-axis and c uh the color is black and then x equals x1 so that we plot this on the same image now we'll show this so that we don't see the nasty print on it and what do we see indeed on the left-hand side we see the normalized stock on the right-hand side we see the cases counts and like the daily cases so at the at the worst moment it was like 250 000 cases every day and that was at the beginning of this year which is pretty scary so if we then start interpreting this what do we see that there's a huge drop but actually that the huge drop in stock value already happened even before there were a lot of cases in the u.s so it was really anticipation by investors that like gave them a scare and what we also see interestingly is that when the pandemic that is at its heights like there's crazy amounts of cases of course like these first bumps they're like underrepresented because there wasn't enough testing equipment etc so it's possible that these bumps were even way higher but what you see is that even though like pandemic is in full effect in some places like in in the us that a lot of stocks like this for example the orange one is the netflix stock but this rebounded like crazy already um what you saw is that even like the the booking stock for like a travel company the the stock is is higher than it was at the start of the pandemic is higher than it was before the pandemic even when um corona is like in full effect and what you also see is that for example microsoft like they started load they had a slight dip but then it's all the way up from there so they really uh proved themselves to be like a pandemic-proof company with all the internet services that they provide so you could say like i can add a small analysis here uh this is not codes it's not super important so you can say indeed interestingly the sharp decline of value already happened way before the number of cases in the united states started to skyrocket by dynamically sweeping across the country some stocks like microsoft already rebounded to level scene before codes so you can do a small analysis here uh we can clean this maybe up a little bit or is this fine like this i actually think this looks pretty good um i can maybe uh let's plot all closing prices maybe normalize closing prices over time um overall like let me maybe just run this top to bottom again so i can do run all cells run selected cell in all below so that's basically running it again from top to bottom which is also something that my publication will do for me but i just want to be sure that this all works out of the box and the publication won't cause any problems so we see indeed like this all looks good so again i can click on share i can select which notebook i want to publish in this case i suggest you select notebook that ipine b because that contains the work you did and not the work that i prepared for you before and then i can click on publish again wait for a second because it's running from top to bottom and then in a couple of seconds this workspace should be published and then again it will be automatically enable like the public access will be automatically enabled for the publication and this will be not automatically featured on my profile page so i can check this out what do i see i see a good looking publication about stocks and kovitz i can also see on my profile page that now there's yet another data science project of mine namely stocks and covert with a nice picture is also featured on my profile page now as such i'm basically building up a data science portfolio right like this is something i could send to my employer like to a potential future employer and say okay i'm actually pretty solid in data science check out all of these projects of mine this is something you can perfectly do with data camp workspace all right so this was the second example i wanted to walk you through on analyzing stocks and covets um i'm going to take a quick break to have a drink of water and then i'll answer some questions before we move on to the third part of this code all right so there's a first question by chang she asks could you read or he asks i'm not sure um um they ask could you repeat why percentage percentage and the exclamation mark were used so what the percentage does i'll start by explaining that that's basically doing bash commands so basically doing terminal commands inside your jupiter notebook to show this with an example so i can for example i can run in the terminal i can run for example ls minus la for listing all files and that lists all files in my current working directory right the thing is you can also do this from inside jupiter lab i can do exclamation mark ls minus la and i just get this exact same output and the thing is normally if you're on a normal computer and you want to install some python packages typically you go to your terminal and you say i want to pip install wi-fi nets right i would say in this case would say like yeah it's all there like it's all uh it was already installed no worries so the thing is rather than having to go through your terminal you can just do this from inside your jupiter notebook um so that everything like basically your entire uh data data analysis is self-contained and you don't have to include instructions like hey go to your terminal do this and this and this and this to install all of these packages right um the other thing is the the other thing that cheng asked were these percentage percentage signs so let's me let me just remove this and then see what happens so i do shift enter and now there's all of these outputs that i also saw in the terminal they're all being included in here but actually it's highly uninteresting it's something i don't want to include in my report right so what i do is i use it's they call the magic commands um a magic keyword the percentage percentage capture is basically same i'm just going to swallow up any output that the command the the commands in the cell generates and in this case it's all happening behind the scenes without any any out it's of course happening but you don't just don't see the output and that way i can keep my report clean i'm actually also doing this here because if you don't do it you'll see that there's like a loader in here and it doesn't serve any value to me so i just say capture but don't show this output but still of course the data object is there all right so i hope that answers uh jane's question um then there's a question by somebody else i'm not sure what the name is it seems to be initials only bk um for such timeline plots does the date need to be indexed always so i'm not sure like pandas is very good with dealing with uh timing like time series data um i do know that if you set the index of a pandas data frame to be the date against which like you're you're you're sure like you're working with other data it just makes things super super super simple for you um and pandas basically just knows what you knows what you knows what you mean um if you set the index properly i'm sure there's also ways to do it without having the date be the index but then everything will be definitely harder for you to do then uh pk also asks can uncom some series have negative values so normally the uncompson should not have any negative values because that's against against the concept of like the cumulative sum like by definition it's it's a sum like in this case it's coveted case count so you can't have negative cases you can't un mark somebody so this uncomfortable array will contain only positive values if all as well and we can also check that but let's maybe have a look um i'll go like this this unconsent that's like this numpy array here and i can do um np any of the income sum is any of the outcomes from smaller than zero yes it is let me see there's one in here that's indeed true that this funky this must be a mistake this is interesting i'm not sure what happened here maybe it was a correction of sorts where um the like the authorities like counted some some cases wrongly or something this is interesting uh this shouldn't happen so this is a classical case of data quality problems so it's a great question by pk i don't know what's going on here by by definition the that should only go up right so the unconstrained should always be positive but yeah i don't know what happened here this is something i didn't i realize happened um then there is a question by yi um to talk for a second about number operations that need to access the data in the previous record like mp and certain np deletes um so yeah i can maybe show an example but i won't spend too much time on this because it's it's a bit technical i'm not sure if it uh it really serves the serves to go along here um but let me just show it an example so i will create a numpy array that is 1 1 3 5 5 8 9 like this r 8 12 like this and this is basically the cumulative but let's say that this is the accumulative sum right the next thing we do we ca we create accumulate of some uh shifted like this i'll print some kind of some of this and then come some shifted and what do we see we basically made the um we removed the last value and we inserted a zero so basically shifted the entire thing up by one to the right hand side right so all of this moved over a zero was added on the left hand side and the twelfth was dropped on the other side so what if i now do a book where my i think i lost not here if i now do come some a pairwise um calculation that comes and minus becomes unshifted what am i doing i'm doing one minus zero one minus one three minus one and as such basically my uh my daily case counts are appearing again right and then i see indeed one one means day one there was a one but here there was only a zero like no cases added then 23 that means that here two cases were added then goes to five means another two cases and it stays at five again no cases for added then it goes to eight three cases were added then goes to twelve four cases are added so it's just um by moving it up and then subtracting it you're basically getting rid of the of the suns in here right so i hope that answers your question for heath all right so quick recap what we did so far we created a workspace from a workspace template here a dataset templates we also created a workspace from a github repository through this functionality and in both cases from that workspace we created a publication and featured it on our datacam profile page which is something i hope all of you also did the last thing i want to show is a feature that we added recently which is called integrations and integrations allow you to securely store secrets in your data camp workspace so that you can access databases like for example a postgresql database that is living somewhere else on the internet not in your daily workspace so um to show that off i prepared yet another github repository that again we will start from um over here and it's called dvd rentals all right so here we have a public repository that all of you should be able to access again so filip sch is my github handle and then the repository name is dvd dash rentals so again there's a uh ipine b in there a solution of type in b as well that you can check at to understand what's going on if you maybe missed a couple of steps uh just read me and there could ignore everything so as usual let's get started by going to my workspace dashboard clicking github repository and then using philip sch dvd rentals all right so it's saying that this is about gita prepo and again i'm going to open this up in python i think we're hitting the limits of what is possible with github unfortunately so this is a feature we added recently and we hit the github rate limits give me a second to check i'm going to try it once more yeah so the thing is github doesn't like because everybody's like eagerly following along we're basically hitting the limits of github because it's constantly like our data cam services asking github like what are the contents of this github repository and because of that um that github is saying like there's too many requests coming from data camp and they're basically disallowing it so i will do instead this is not too bad but we'll basically have to do it somewhat less nicely we can do here is download the zip it's just like that then i can create a new python workspace like this just at the top over there like so i can just drag the zip file in here i hope i can then expand this i'm not sure if i can easily unzip this let me open up the terminal unzip dvd rentals dots zip dvd rentals master.zip all right perfect so this is what you can do right i'm also going to share this in the q a dock so that sena can share this with the others so basically the steps are basically download the zip from github open up a new python workspace and then upload the zip file into the github into the workspace and then unzip dvd rentals using the terminal commands so uh now it's inside this folder here which is fine but i'm just going to move this to here so i'm just going to get rid of this notebook here because that's like the default notebook and then i'm just going to move this all here so now i have basically the same result but as you can as you saw this was taking away uh way longer good so now i can open up notebook i find right i'm gonna wait for a second so that senna can share these uh instructions and you can get things set up so again repeating you can go to to get the repository click on codes click on download zip and then you can head over to your data camp workspace like to workspaces here you can click on new python workspace that will open up a new python workspace and then you can so then you land in here i'll also call this dvd rental so that i can use it for later on dvd rentals and then you can upload the zip file and then you can use the unzip commands inside the terminal dvd rentals.zip right i'll wait a couple of seconds longer so that people can follow along here good i hope everybody managed to catch up so um the result should be that you can open up a notebook you know you can try to clean it up it's not important the most important thing is that you have a notebook that i buy in b with all of this um like basically with all of these uh all the sample code that i prepared from the github like in the github repository so um what it's saying it's about analyzing movie rentals right so in this short notebook we'll connect to a postgresql database with some sample data around the dvd rental business after which will visualize the data in python so in this notebook we will access data in a postgresql database that is on google cloud platform which is a bit like the aws of google and we will do this all from inside our workspace i also included a structure of the database here so we see here that there's rental inventory film film category category so there's all all kinds of databases inside this uh all kinds of tables inside this database as a first step let's create a sql alchemy engine the database strings construct environment variables that were workspace integration i'll talk more about that later so typically how you do this you say from sql alchemy which is a common python package to like connect to postgresql databases i'm going to import create engine i'm going to use pandas i'm also going to use import os i'm going to create a database string and there i'm going to say is filter sql database certain user a certain password certain host certain database name and i would have to write all of these things in clear text here i would say my user is such and such my password is such and such my host is this ip yada yada this is not the correct ip by the way but this is actually very insecure because if you then share this report with a friend or with a colleague you're basically exposing your username and your password to this like to your friend or to your colleague you're sharing it with so this is definitely bad practice what we did to fix that is basically the concept of a data camp integration and that's something you can access from the left hand side here i'm quickly going to check the qa doc to see if there's any questions left no okay i hope a lot of people manage to follow along um so instead of hard-coding all of these things inside my database string i will use the concept of an integration and i can basically create a new set of environment variables these environment variables once i've created them i can basically load them into my session and then use these environment variables to create my database stream right so uh cena will share um the names and values that you have to use to follow along here so the database host in this case is an id like this then there's the database name so that's the name of the database we want to connect to that's dvd rental then there's database username that is learner in this case and then there's database passwords and that's datacamp so that's basically those are the credentials like the details for you to access this postgresql database connection i'm going to give this a name i'll call it dvd rentals 2 because i already experimented with it before so i'm now going to create this great it's immediately giving me some code that i can use so i'll copy this to the clipboard then i'm going to go next and it's asking do you want to connect this integration to your workspace this will like we need to restart your workspace session for this yes i want to do this so i'll say connect right now my workspace is restarting and it's basically making sure that these environment variables are now available inside my python session so i can check this by opening up the terminal and clicking m and seeing whether there's any database commands in there and what i see indeed there's database commands database credentials in there and indeed my environments contains db password tb username db host db name so basically the things that i just specified inside the integration step so rather than hard coding all things in here i'm just going to use the code that i copied from the clipboard and instead of using db host here i'm just going to say dbhost in here i'll do like this because i can use format strings because i started my string with an f then the name of the database is i replace with this the user name is this and the password is this and again i have to wrap it inside curly brackets if you can't nail down the syntax don't worry so you can just open up the solution of the ipad b and uh and try it out i can and then copy it from there if you're not entirely comfortable with all of this so let's have a look first at this database string oh something's wrong clearly ah i still have to uh wrap this inside curly brackets so that my string formatting works out so this is basically the connection string that sql alchemy expects to create a database engine so this is perfect i'm going to uncomment this great and let me just quickly try out whether i can just execute a random sql query so it's very simple one we'll just count the number of films let's say right we'll count the number of films that are in the database of this dvd rental business and what do we see um we see indeed that it can counted so there's exactly 1 000 movies inside the dvd rentals database which is a dummy database that's maybe why the number is so like it's a round number so this works perfectly fine so what i did right now just for your understanding i was inside data camp workspace and from inside data camp workspace i basically i basically uh connected to a postgraduate database that is living inside a google cloud platform data center and i'm just accessing that from inside data cap workspace without having to leave my data camp notebook like my uh my ipython notebook experience all right so this is fairly interesting i'll say just counting the number of films but let's say we just want to see how often each film category was rented out so this would require us to get like the rentals table because then we know when there were rentals happening and then we would have to link it up to category like through the inventory to the film through the film category into the category we would have to basically do a bunch of joins so that we can combine every rental with a certain film category all right so let's do that so do like this um i'm just going to do a select star here for now so i'm using triple a string so that i can basically chop up my python string into multiple lines which is way more readable if you're writing a lot of binding codes so i'll do select star from rentals in the first place a rental and i'll interjoin that with inventory so this is sql code i'm writing right so i'm writing sql code inside a python stream inventory id so now we're linked up to inventory we'll also inner join us with film using film id those are the that's the like the the foreign key here inner join film using film id i don't use the commas here then i interjoin from film to film category using film id no so i'm i'm wrong here this should be through inventory id apologies an inventory should be true rental id or wait if entry id yes and the film id and then i use again film id it's like this enter join film category using category using category id all right and then what am i interested basically inside the category table i'm interested inside i'm interested to the category name so the category name as category name i'll also do maybe later on it can be useful film title uh i'll put that in the front because it's like the first thing you want i guess um so title so it's film dot title as film title category dot name ask category name and then also let's let's get to read all dates so we have some some information right great so i did a bunch of joins and now i see that for example the film titled freaky focus was rented out in 2005 since it's a pretty old data set and the category name was music all right so the last thing i can do now and it's the nice thing because i have this data as a pandas data frame in python i can very easily um print this like i can do something now with this i can basically uh for example calculate an aggregate um or uh make a plot right so let's do that so i do a df i group by the category name and i'll count to size what we see there's indeed there's 1112 action rentals 1466 animation rentals uh 945 children rentals for different categories or different rentals again to clean this up i'm going to reset the index here and i'm going to use basically because this size this created a series but i want to turn the series into a into a data frame again so if i turn it into a data frame again um we see that the category name um this is now no longer and this is next now and then let's already try to create a plot of this rather than using matplotlib i suggest we use plotly for once that's also a package that is by default available inside datic and workspace you don't have to install it yourself anymore we'll create a bar plot of uh let's call this aggregates this data frame like this create a bar plots of aggregates um x is the category name y is different all right what does this look like wait for a second exciting all right great all right so we see a bar plot here but i don't really like the way this barcode looks because it's hard to compare like for example sports and animation who is the winner here like they're very they're way too far apart right but i can do here to add to my aggregates i can sort the values here by for example count in this case and in this case this is way nicer so i do see that animation just does not win out from sports that music is the least popular category that's been rented out uh foreign is like somewhere in the middle animations works here so this was just to showcase a bit how you can go from a workspace inside python just i should try to mark that from a workspace completely written in python but access postgresql data from inside this workspace again if i'm publishing this i'm not exposing any secret information because that secret information has been basically injected in a way into my python session but it's not being hard-coded into my python code here so this is way way better than hard-coding usernames and passwords right into your python notebook so again as usual uh final step is basically to share this i want to publish this with the world because i'm super proud of the work i did i have to wait for a second for it to run from top to bottom but it's that's the important thing so it takes a while to run but that is because it runs from top to bottom so it really verifies that every code cell one after the other works that also make sure that all the code you wrote works in the order that it appears inside the notebook if i then open this up i again see a good looking publication about analyzing movie rentals with the python code again no secret information is being leaked here i am connecting here i'm checking the data frame i'm doing an aggregate i'm doing a plot and i have a good looking plot to end with again if i check my profile page what do i see i see the dvd rentals data yeah it's showing the first picture of the publication by default i don't really like that so maybe let me just get rid of the structure this is also not very interesting for the reader let me just get rid of that this i hope this should go up and then this i can move in here and then this and then i'll share update again let's see what this gives that's still running yes now it's finished good and then if i check my profile the dvd rentals data yeah that's because plotly is a html library so it's uh the thumbnail cannot be easily shown here that's uh that's actually something we should improve on so that also plotly thumbnails can show up on your data can profile because this looks a bit boring to be honest um all right so that was it now we have five minutes left for some final q a if there's people still with questions and there are some questions um yeah so i made a mistake um so pk says she's getting an or he or she is getting an error i'm trying to unzip that's because i made a mistake it's dvd rentals master.zip and not dvd rental.zip then chain asked how do i know the specific value of the ip address of this database do i need to know it in my specific coding so yes indeed this is something that you have to know yourself so say for example you work at a company that has postgresql databases that you can analyze then you should go to the system admin and ask him or ask her ask them what is the ip address what is the url basically where is this database hosted so that i can access it so for this example um dvd rentals was contained inside like dvd rentals was at the ib that is shown here but for other databases they will be in different places right so this is the ip address that i read off from my google cloud platform console uh so google cloud told me this is where your database is running this is how you can connect to it so this is the ip address you should use then um upd is asking a question the integration feature is an excellent function is it secure enough to store credentials from another server and is there any encryption involved so that's a great question actually um so yes um it's very secure uh so we use um an external service called um hashicorp vault which is known to be like the best in terms of security and encryption of secrets so rather than us storing these secrets that you put in here ourselves um we basically depend on a very professional company whose main business is storing secrets as safely and securely as possible so every secret you store here is being encrypted in transit on the disk and only decrypted when you need it so it's entirely secure and then there's a question from pk um how much data can the notebook handle so i'm not sure exactly what you mean so um there's different ways to answer this question i would say that a workspace you can put three gigabytes of data inside a workspace um you can go up to five gigabytes uh but if you go over five gigabytes of data that you store inside one data camp workspace then um you lose access to your workspace of course we can still get you the workspace data afterwards but we for now set some sort of a fair limit use on three gigabytes of data um then in terms of how much data can the node handle can also be in terms of processing power in terms of working memory so in terms of the ram workspace gets so you get four gigabytes of ram inside data cam workspace um depending on how complicated the algorithms are that you do this becomes a problem sooner or later but for like the basic analyses we did for for example accessing postgresql data this is not uh like this shouldn't be a problem the ram shouldn't be the problem if you're going to do crazy ensemble methods or you're going to do hyper hyper parameter tuning for different random forests that's when that's moments when you could potentially hit the limit if you do hit the limits definitely let us know through the feedback mode over here let us know like hey i'm hitting the limit for my analyses and that's good information for us to at some point also increase the resources we give every person on data camp workspace so i also hope that that answers pk's question are there any other questions all right cool um senna lets me know that there's no additional questions so i think we can um leave it at this right on time only 90 seconds left before it's uh 6 30 here in belgium i want to thank you all for attending um slight hiccup at the end with the dvd rentals data probably in an hour you'll be able to start the workspace from the github repository again um but uh thanks for staying with us um thanks for tuning in and i hope to see you soon in another quarter long session where i'll hopefully be able to showcase more nice data cap workspace features that will help you basically do data science in the blink of an eye without having to configure or install anything on your own system thanks goodbye thank you phillip uh just some closing messages here um we'll be organizing more of these presentations in the future um so stay tuned and please let us know if you have any additional feedback regarding um these sessions we will post the survey link in the chat but also on slack along with the recording so your feedback will really make a difference in helping improve the future events that we hold and improve workspace as a whole and finally if you haven't already please feel free to join the data camp global slack community where we will keep you updated about new workspace features and events that are happening so this rounds up the fourth workspace live code along session i remind you that the session was recorded and the link to the recording will be posted later on slack email and social media so with that being said uh have a great rest of your day and thank you for joining\n"