Beginner Intro to Neural Networks 12 - Neural Network in Python from Scratch

**The Flower Prediction Problem**

Let's uh plot all the not plot let's print all the predictions for each of the points so for this point which was a red flower I call the flowers points and the points flowers so bear with me on that but for this flower which was a red one you see the target was one the model is saying 64 so it's above 0.5 so I would say that's good for this one it's blue and the model is really really low so that's good it really thinks it's a blue flower for this one it's uh a red flower and it's really high so that's good blue low red high blue really low and and red high blue low awesome so for each of our points it got it right so now we can do the mystery point so that's mystery flower see mystery Point mystery flower same thing we'll get the prediction for that and just type it here and whoa red it's really high close to one so our computer is telling us that this is red let's make a little uh thing here um OS and we can do OS dot what do we want to do Os or CIS we want to run something on the terminal os.

**System Integration**

system that's it thank God for stack Overflow we want want to import OS so we can do some stuff with our operating system and let's say say hi hi let's try Samantha hello my name is Samantha I am an American English voice whoa you're very loud hello my name is Samantha I am an American English I'm an American English okay we can make the computer say stuff now with this command so let's make a function that's like which which which flower and it takes the what is it length or width length and width width and it makes a prediction uh which is done right here so it we'll say z uh takes the length here W2 gets the width B the prediction here now we'll say if the prediction is less than 0.5 it's a blue flower say blue else say red so now we have our handy function here which flower let's say it's a one one it's a weird shape of pedal but let's see blue W it's blue okay let's work off work off of our data so this is more fun I know we already saw all the predictions but let's just try this 2 one the flower 21 what is that blue Ah that's right how about 41.5 red whoa it's pretty cool actually 5.5 and one all that's some good measurements red oh it's red Wow Let's try the mystery flower 4.5 and one red wow supposed to be red good job computer good job let's try two in 0.5 what is that blue that's right uh what about the anti flow negative one and negative one blue oh the anti flower is blue how abouta 100 the mega anti flower the black hole antiblack blue blue oh no we got to stack we got to over not stack Overflow but a a um overflow because our input to the exponential was so huge interesting so the mega anti-l is giving us values in our computer's brain that it are so small or so large that it can't store them uh let's try the mega flow red let's try the pie flow o that's actually quite a mystery let's see if we can figure it out ourselves the pie flower Pie by pie oh that's red pie flower is red red how about the 90° flower blue oh the 90° flowers blue how about the e flower is that exp the E Yeah exponential okay what about the uh in Infinity flower red it's red holy holy moly wow what's the prediction for the wow it's one that's crazy what's the weighted average of infinity time oh it's Infinity okay so the infinity flower is red what other flowers the zero flower the point flower blue oh it's blue okay all right so our computer you can see has very interesting powers of telling us the types of flowers that um could not possibly exist either in our universe but anyways this is how I would go about solving this by hand uh in a later video let's look at how to solve this with like tensor flow or Caris or something that can spit out a model for us and we don't have to worry about taking the derivative and um training it ourselves we can use all these fancy optimizers and momentum in our gradient and uh batch training you know train on a couple of different flowers and then apply the update all that good stuff so anyways well let's look at that in a later video but yeah I just wanted to solve the problem using python for completion uh for anybody who wants to use Python to solve it and do it in an kind of interactive way here uh so yeah thanks for watching and I'll see you guys in the next video guys and gals in the next video

"WEBVTTKind: captionsLanguage: enhey everybody in this video I want to code up a solution to the flower problem using python so navigate to where you created your environment using virtual environment or miniconda and I'm going to type source and then series bin activate and that will get me my environment okay now in this video we're going to do some graphs so let's install Matt plot lib it's this cool graphing tool we're going to like graph the cost over time and see it go down hopefully we're ready let's open up our notebook so we'll type Jupiter notebook and we have to copy this because of some error with opening URLs okay I'm going to make a new notebook now we're going to do some graphing so if we want the graphs to show up in these cells we have to do this crazy thing map plot lib inline and then we're going to import some stuff from map plot lib so from map Li import Pi plot as PLT and then we're going to import numpy as NP shorthand okay so now that we've imported all we need let's define our data and so we'll say data equals now the data is going to consist of points or measurements and each one is going to be a list itself so it's going to be a list of lists okay so there's the data from my own video so I'm going to go back and for the first one it looks like we have a three and a 1.5 and it's red so 3 1.5 and red I'm going to say as one so each point is length withth type which is a zero or one for blue or red and I'll go ahead and add the rest here now the mystery flower I'll actually give a name and that had measurements 4.5 and one 4.51 and we don't know what it is so I just won't even put what our Target is for it we don't have any Target value for it okay that looks pretty good so we have our data so let me show you how to look at one of those so we'll just say data zero and that's going to give us the first item here so these are zero based indexing in Python so zero is the first item and you can see it gives us another list let's look at the second item so 2 one and zero because this is a list itself we can get out the elements uh here as well using this indexing so we put square brackets and we can get the first item here with another zero so we get two so if we wanted to say get this value of 1.5 we'd say 01 2 to get this list and then 01 and we get out that value 1.5 so that's how we can access our data now we need our Network so let's remember our Network architecture it's very difficult and we have one output and this is the flower type and our inputs are this first feature and then the second feature so this is the length and width as inputs and our two weights are W1 and W2 and we also have a bias that we add on so let's get our weights first so we'll say W1 is equal to nump pi. random. randen so randen gives us a random number from the uh the normal distribution and it has zero mean and one variance so we can take some samples by holding control and just running this over and over and you can see we're getting some random numbers they're going to be around zero and they're negative and positive so we'll make W1 a random number we'll make W2 a random number and we'll make b a random number right because the connections in our neural network start off random so we'll initialize those and take a look at W1 let's say W2 and B they're all random numbers now we're going to need an activation function and for this problem because of our flower type being only zero or one sigmoid is a good option so we'll Define sigmoid and it takes a number in X and it returns one over 1 + exponential of XX now let's try to plot this so we can get a list of numbers in Python pretty easily I'm going to call them T and this will be the domain of our function you know what we plug into this sigmoid and so we'll say numpy linspace and we could say from - 5 to 5 with 10 subdivisions now if I just print T out by typing it you can see we go from minus5 and then we're taking steps here all the way to five now I can just get the output of this by running our list of numbers through my sigmoid function and we'll actually calculate whatever this result is for each element and that will become whatever is in y so you can see whatever's in y is all positive because sigmoid remember squashes all numbers to be between zero and one now we can actually plot this so remember we imported Pi plot as PLT from matplot lib so we can say PLT do plot and then the X's are our T and the Y's are our y I don't know why I use T here you could use a big X you could call this variable anything you want but we can plot that and there's our sigmoid function which is pretty cool now you can see it's kind of like edgy here not to be edgy but uh we could up the resolution by making more subdivisions so we could say 100 subdivisions and now it's really smooth and we could also increase the domain to see a little bit more of the squashing behavior of sigmoid so we go minus 20 to 20 let's say you can see that numbers that are very negative get really squashed close to zero and numbers that are really positive get very squashed close to one now we're also going to need the derivative of sigmoid so we can type that sigmoid P I use a p like Prime sigmoid Prime because sometimes you could say the derivative of a function f is f Prime like is a short hand so sigmoid Prime and that is just sigmoid of x * 1us sigmoid of X it's kind of interesting how it's defined using the function itself again it's kind of crazy we could maybe see why that is in another video but so that's our derivative of sigmoid and we'll run that and I could graph that by just running our domain through that so we got our sigmoid curve in red and our derivative in blue and let's make the domain a little tighter here so if we look at the slope of the red line let's start on the left the slope is positive and it's starting to get higher and higher and higher and you can see that the blue line is really a graph of the slope of the red line so right here the slope is at its maximum you can see it's like the steepest part of the red curve and that's why we have the highest part of this blue curve right here and now you can see the slope is starting to decrease like the function isn't increasing as much as it was here as we move move to the right more and more and so you can see that the slope you know is decreasing here it is see how it's more shallow it's less steep over here until it's almost flat and that's why this is getting closer and closer to zero the slope is decreasing over here okay so now let's make our training Loop training Loop and what this is going to do is Loop over our data pick out a random Point run it through the network see what it should have been and that will become our cost and we're going to take the derivative of the cost with respect to the Network's parameters use that to update the parameters by subtracting it and that will decrease the cost improving the Network's prediction for that point and if we see enough points we're going to get better and better predictions so that's the idea so let's try it out here so we're going to run over our data thousands of times so we need a loop so we'll say 4 I in range and we'll say 1 to 1,000 un let's just print I and see if that works so this is going to run whatever we put uh in this indented part a thousand times now why can't I just type a thousand I can okay let's do that so we'll say for I in range 1,000 so this is going to have a th000 times that it runs it should go up to 999 let's see yeah okay so we want to run our code a thousand times now we want a random index so we'll say random index is equal to np. random. randint and I'm hitting tab for that auto completion and then we want the length of our data and let's see what we get there so we should be getting random integers that we can use to index into our data and pull out a point so we could get a point is equal to data at a random index and let's print out the point and let's only do 100 for now just so my screen doesn't go crazy so there's a list of 100 random points from our data so we see 2.5 and zero and if we look at our data there it is that's one of our data points right 2.5 and zero let's make uh a scatter plot of our data first actually before we start to train so we can kind of see if it looks correct correct we're going to say scatter data for I and range length of the data so this is going to give us an index that goes from zero to the length of of the data minus one we're going to get our point which is equal to data at I and then we're going to say plot. scatter scatter the point's first feature the point's second feature and let's just look at that first okay we got all these points now we want them to be colored correctly based on their type so we're going to make a color variable and we're going to make it red by default or just R by default and we're going to say if point the type is equal to zero color is equal to Blue and now we can just put C is equal to color and there we go those are our points as a scatter plot so we can see the flower at um well let's change our axes a bit so we can see this better okay that's better so we added a grid pt. grid and pt. axis sets the Xmen xmax y Min y Max of our plot so we see a scatter plot of our data okay look at those points there blue blue blue blue like a t blue blue blue blue like a t and red red red red here that looks pretty good to me so those are our points now let's do our training Loop so what our training Loop's going to do is grab one of these points randomly see what the network output is for it and use that as part of getting the derivative of the cost and bringing that derivative back to our parameters and updating them to minimize the cost and get the prediction you know more importantly get the prediction of the network closer to what we want it to be at this point so we're getting a random point now let's feed it through the neural network so I'm going to make a variable called Z and this is going to be the weighted average of the points features so point0 * W1 uh the weighted average of the points features and the bias so 0.1 is the next feature time W2 plus b so we can look at all these Z's we're getting for each point all these Z's for each of these flowers now I'm going to make another variable called H and that is the application of our activation function on onz so now we can print all these H's and they should be between zero and one so we're getting all these high values interesting now our cost is equal to the Target which is point the the third part of the point right with those zeros and ones we put so that's our Target so the cost is equal to Yus t or H we called it in this case minus t it's really the net Network's output you know the prediction I should use PR because I Ed that in previous videos so it's PR minus Target squar and the way we can do that is numpy do square square that okay so that's our basic cost the prediction minus the target squared is our basic cost now I can print that for each point and we can see some points actually have uh really low cost and I bet you they're red points because our outputs for H uh or our prediction were all really high right they're all close to one so all the red points are probably getting um a low cost let's just prove that by printing our points information and the cost so let's see for these blue flowers see the zero here at the end for these blue ones they have a high cost and for these red ones they have a low cost and that's because the predictions were all really high just given the weights and the features right now but anyways it looks like it's working now we want to take the derivative of the cost with respect to each of our parameters so we're going to have to work our way through the model and look to the previous video to see you know why I'm doing that and how I'm doing that first we want the derivative of the cost with respect to the prediction and that's the power rule so really we have this minus this squared with a two out here so we bring the two out in front and then it's times whatever was in here pred minus Target so we have the Der of the cost with respect to the prediction and so we're right here and we want to take the derivative of the prediction with respect to Z so that's sigmoid Prime so this derivative of the prediction with respect to Z is sigmoid Prime evaluated at Z we're taking the derivative of this with respect to Z so it's the derivative of sigmoid which is sigmoid Prime we defined over here this blue function and it matters where we were you know if our activation was really high out here the derivative is going to be really small and if our activation is over here where the slope is high the D the derivative is going to be high so remember it's evaluated at Z so now we're at Z here and we're going to go to W1 let's say we're going to take the derivative uh with respect to W1 so we're going to say DZ dw1 is equal to 0 remember whatever uh is multiplying W1 is the derivative of this now we can also because we've worked our way through the graph here right so we can say DZ dw2 let's do that next and that's point uh one because that's multiplying W2 and DZ DB you know derivative of Z with respect to B is just one because we have a one time B here you can picture Okay so we've kind of worked our way through our model and now we can chain these things together this is the chain rule after all and get our derivatives of the cost with respect to each one of our parameters so we'll say d cost dw1 well that's going to be the cost with respect to the prediction times the prediction with respect to Z times Z with respect to W1 and you can almost like ignore all this middle stuff and you see we get dcost dw1 but this is uh of course you need all that but just to check yourself you can you know see is this stuff matching up D cost D then you should see PR and then DZ and then you should see Z so whenever you see the end here you should see at the beginning here so we have our whole chain to get us from the cost to W1 we're going to do the cost with respect to W2 and that's going to be the same first part we could uh store this somewhere let's do that so that's going to be de cost DZ and you know we're bringing it through the pr so we've got dcost d d DZ now all we need is this in here so dcost DZ DZ dw1 now we'll do DZ dw2 and D cost DB is d c DZ DB so those are our partial derivatives of the cost with respect to each of the parameters now we want to subtract a small fraction of them from our parameters so we'll make a learning rate make it a small number we can tweak that later and we'll say W1 is equal to W1 minus the learning rate time dcost dw1 and we'll do the same thing for the others W2 D cost dw2 and B is equal to b d cost DB let's print the cost as we go but let's only print it if I mod 1000 is equal to zero um this gives us the remainder of the division of I over 1,00 and so if the remainder is zero I is evenly divisible by a th000 and so we'll get uh updates of our cost less frequently than 10,000 times so I'm going to do 10,000 times we should get 10 updates now it doesn't really look like our cost is going down that much so let's see why that is Target we have our Target let's make our learning rate higher okay that looks a little better uh but it's kind of hard to see if this is really decreasing you know we want to see this decrease and it's going to be a bit noisy because this is the cost for a single point when really we care about the cost over all of our points but first let's just keep track we'll say costs let's keep track of all the costs that we uh record and I'm going to keep track of them all why not so we're going to say cost do append the current cost and now we can plot these at the end so we can say PLT do plot cost and whoa that does not look that good it does not look like it's going down very much so let's see what we're doing wrong here bump up the learning rate I mean that looks like it's doing something you know I'm not liking how it's not going down that much I guess that's okay do 100,000 all right I mean it looks like our cost is decreasing over these 100,000 iterations which just seems like so much for this problem but what are you going to do we're ending up around here 0.001 so our cost is pretty small you know let's uh let's change these around 2 go back to 10 ,000 let's look at the cost for all of our data points uh as we update our parameters now we can append the cost for all of the points let's see why that so weird it's a little weird oh I know what's going on we need to just reinitialize our weights uh every time we run they probably at some extreme value so let's bring our weights down here that's probably why okay that's very strange okay that's just weird let's do a really small learning rate that's even crazier look a little more that looks that looks awful okay after a lot of well not a lot but a little bit of playing around uh I went back to this JavaScript solution that I made and I just use the same hyper parameters they're called stuff like the learning rate and number of iterations and I'm getting the same parameters that we got here so I'm going to use a learning rate of 0.2 and do 50,000 iterations so 2 learning rate and 50,000 iterations and every time I is equally divisible by 100 I'm going to get a little summary of the cost so for each of the points I run it through the network here and get out a prediction and see what it should have been get the squared difference and that goes into this cost sum variable and then I just make that an average by dividing by the points that went into it and I keep track of those and if we look at the graph of those you can see the cost decreasing here and we still get these kind of spikes and I have a feeling that these spikes of high cost are coming from points that are right here on this decision boundary if you can see you know in the JavaScript solution to these points will still have high error like this red point the prediction is 76 so you're going to get 1 -76 which is24 squared which is 005 and if we look over here that's going to be one of these little guys here 0.05 so we're down here so it's all these tiny little spikes right these are caused by these points being so close to this decision boundary over here our prediction is point about about .4 so we have 1 -4 squared and that gets us actually sorry the target for that is um we'll do 04 minus 0 so that's .16 so if we look over here you know that's one of these spikes over here so these spikes are caused by these on this decision boundary and we could continue to push our parameters like crazy until the cost went down and down and down but you know if the prediction is lower than 0.5 for this point I'm happy enough with that and if it's more than5 for this point I'm happy with that so let's uh plot all the not plot let's print all the predictions for each of the points so for this point which was a red flower I call the flowers points and the points flowers so bear with me on that but for this flower which was a red one you see the target was one the model is saying 64 so it's above 0.5 so I would say that's good for this one it's blue and the model is really really low so that's good it really thinks it's a blue flower for this one it's uh a red flower and it's really high so that's good blue low red high blue really low and and red high blue low awesome so for each of our points it got it right so now we can do the mystery point so that's mystery flower see mystery Point mystery flower same thing we'll get the prediction for that and just type it here and whoa red it's really high close to one so our computer is telling us that this is red let's make a little uh thing here um OS and we can do OS dot what do we want to do Os or CIS we want to run something on the terminal os. system that's it thank God for stack Overflow we want want to import OS so we can do some stuff with our operating system and let's say say hi hi let's try Samantha hello my name is Samantha I am an American English voice whoa you're very loud hello my name is Samantha I am an American English I'm an American English okay we can make the computer say stuff now with this command so let's make a function that's like which which which flower and it takes the what is it length or width length and width width and it makes a prediction uh which is done right here so it we'll say z uh takes the length here W2 gets the width B the prediction here now we'll say if the prediction is less than 0.5 it's a blue flower say blue else say red so now we have our handy function here which flower let's say it's a one one it's a weird shape of pedal but let's see blue W it's blue okay let's work off work off of our data so this is more fun I know we already saw all the predictions but let's just try this 2 one the flower 21 what is that blue Ah that's right how about 41.5 red whoa it's pretty cool actually 5.5 and one all that's some good measurements red oh it's red Wow Let's try the mystery flower 4.5 and one red wow supposed to be red good job computer good job let's try two in 0.5 what is that blue that's right uh what about the anti flow negative one and negative one blue oh the anti flower is blue how abouta 100 the mega anti flower the black hole antiblack blue blue oh no we got to stack we got to over not stack Overflow but a a um overflow because our input to the exponential was so huge interesting so the mega anti-l is giving us values in our computer's brain that it are so small or so large that it can't store them uh let's try the mega flow red let's try the pie flow o that's actually quite a mystery let's see if we can figure it out ourselves the pie flower Pie by pie oh that's red pie flower is red red how about the 90° flower blue oh the 90° flowers blue how about the e flower is that exp the E Yeah exponential okay what about the uh in Infinity flower red it's red holy holy moly wow what's the prediction for the wow it's one that's crazy what's the weighted average of infinity time oh it's Infinity okay so the infinity flower is red what other flowers the zero flower the point flower blue oh it's blue okay all right so our computer you can see has very interesting powers of telling us the types of flowers that um could not possibly exist either in our universe but anyways this is how I would go about solving this by hand uh in a later video let's look at how to solve this with like tensor flow or Caris or something that can spit out a model for us and we don't have to worry about taking the derivative and um training it ourselves we can use all these fancy optimizers and momentum in our gradient and uh batch training you know train on a couple of different flowers and then apply the update all that good stuff so anyways well let's look at that in a later video but yeah I just wanted to solve the problem using python for completion uh for anybody who wants to use Python to solve it and do it in an kind of interactive way here uh so yeah thanks for watching and I'll see you guys in the next video guys and gals in the next videohey everybody in this video I want to code up a solution to the flower problem using python so navigate to where you created your environment using virtual environment or miniconda and I'm going to type source and then series bin activate and that will get me my environment okay now in this video we're going to do some graphs so let's install Matt plot lib it's this cool graphing tool we're going to like graph the cost over time and see it go down hopefully we're ready let's open up our notebook so we'll type Jupiter notebook and we have to copy this because of some error with opening URLs okay I'm going to make a new notebook now we're going to do some graphing so if we want the graphs to show up in these cells we have to do this crazy thing map plot lib inline and then we're going to import some stuff from map plot lib so from map Li import Pi plot as PLT and then we're going to import numpy as NP shorthand okay so now that we've imported all we need let's define our data and so we'll say data equals now the data is going to consist of points or measurements and each one is going to be a list itself so it's going to be a list of lists okay so there's the data from my own video so I'm going to go back and for the first one it looks like we have a three and a 1.5 and it's red so 3 1.5 and red I'm going to say as one so each point is length withth type which is a zero or one for blue or red and I'll go ahead and add the rest here now the mystery flower I'll actually give a name and that had measurements 4.5 and one 4.51 and we don't know what it is so I just won't even put what our Target is for it we don't have any Target value for it okay that looks pretty good so we have our data so let me show you how to look at one of those so we'll just say data zero and that's going to give us the first item here so these are zero based indexing in Python so zero is the first item and you can see it gives us another list let's look at the second item so 2 one and zero because this is a list itself we can get out the elements uh here as well using this indexing so we put square brackets and we can get the first item here with another zero so we get two so if we wanted to say get this value of 1.5 we'd say 01 2 to get this list and then 01 and we get out that value 1.5 so that's how we can access our data now we need our Network so let's remember our Network architecture it's very difficult and we have one output and this is the flower type and our inputs are this first feature and then the second feature so this is the length and width as inputs and our two weights are W1 and W2 and we also have a bias that we add on so let's get our weights first so we'll say W1 is equal to nump pi. random. randen so randen gives us a random number from the uh the normal distribution and it has zero mean and one variance so we can take some samples by holding control and just running this over and over and you can see we're getting some random numbers they're going to be around zero and they're negative and positive so we'll make W1 a random number we'll make W2 a random number and we'll make b a random number right because the connections in our neural network start off random so we'll initialize those and take a look at W1 let's say W2 and B they're all random numbers now we're going to need an activation function and for this problem because of our flower type being only zero or one sigmoid is a good option so we'll Define sigmoid and it takes a number in X and it returns one over 1 + exponential of XX now let's try to plot this so we can get a list of numbers in Python pretty easily I'm going to call them T and this will be the domain of our function you know what we plug into this sigmoid and so we'll say numpy linspace and we could say from - 5 to 5 with 10 subdivisions now if I just print T out by typing it you can see we go from minus5 and then we're taking steps here all the way to five now I can just get the output of this by running our list of numbers through my sigmoid function and we'll actually calculate whatever this result is for each element and that will become whatever is in y so you can see whatever's in y is all positive because sigmoid remember squashes all numbers to be between zero and one now we can actually plot this so remember we imported Pi plot as PLT from matplot lib so we can say PLT do plot and then the X's are our T and the Y's are our y I don't know why I use T here you could use a big X you could call this variable anything you want but we can plot that and there's our sigmoid function which is pretty cool now you can see it's kind of like edgy here not to be edgy but uh we could up the resolution by making more subdivisions so we could say 100 subdivisions and now it's really smooth and we could also increase the domain to see a little bit more of the squashing behavior of sigmoid so we go minus 20 to 20 let's say you can see that numbers that are very negative get really squashed close to zero and numbers that are really positive get very squashed close to one now we're also going to need the derivative of sigmoid so we can type that sigmoid P I use a p like Prime sigmoid Prime because sometimes you could say the derivative of a function f is f Prime like is a short hand so sigmoid Prime and that is just sigmoid of x * 1us sigmoid of X it's kind of interesting how it's defined using the function itself again it's kind of crazy we could maybe see why that is in another video but so that's our derivative of sigmoid and we'll run that and I could graph that by just running our domain through that so we got our sigmoid curve in red and our derivative in blue and let's make the domain a little tighter here so if we look at the slope of the red line let's start on the left the slope is positive and it's starting to get higher and higher and higher and you can see that the blue line is really a graph of the slope of the red line so right here the slope is at its maximum you can see it's like the steepest part of the red curve and that's why we have the highest part of this blue curve right here and now you can see the slope is starting to decrease like the function isn't increasing as much as it was here as we move move to the right more and more and so you can see that the slope you know is decreasing here it is see how it's more shallow it's less steep over here until it's almost flat and that's why this is getting closer and closer to zero the slope is decreasing over here okay so now let's make our training Loop training Loop and what this is going to do is Loop over our data pick out a random Point run it through the network see what it should have been and that will become our cost and we're going to take the derivative of the cost with respect to the Network's parameters use that to update the parameters by subtracting it and that will decrease the cost improving the Network's prediction for that point and if we see enough points we're going to get better and better predictions so that's the idea so let's try it out here so we're going to run over our data thousands of times so we need a loop so we'll say 4 I in range and we'll say 1 to 1,000 un let's just print I and see if that works so this is going to run whatever we put uh in this indented part a thousand times now why can't I just type a thousand I can okay let's do that so we'll say for I in range 1,000 so this is going to have a th000 times that it runs it should go up to 999 let's see yeah okay so we want to run our code a thousand times now we want a random index so we'll say random index is equal to np. random. randint and I'm hitting tab for that auto completion and then we want the length of our data and let's see what we get there so we should be getting random integers that we can use to index into our data and pull out a point so we could get a point is equal to data at a random index and let's print out the point and let's only do 100 for now just so my screen doesn't go crazy so there's a list of 100 random points from our data so we see 2.5 and zero and if we look at our data there it is that's one of our data points right 2.5 and zero let's make uh a scatter plot of our data first actually before we start to train so we can kind of see if it looks correct correct we're going to say scatter data for I and range length of the data so this is going to give us an index that goes from zero to the length of of the data minus one we're going to get our point which is equal to data at I and then we're going to say plot. scatter scatter the point's first feature the point's second feature and let's just look at that first okay we got all these points now we want them to be colored correctly based on their type so we're going to make a color variable and we're going to make it red by default or just R by default and we're going to say if point the type is equal to zero color is equal to Blue and now we can just put C is equal to color and there we go those are our points as a scatter plot so we can see the flower at um well let's change our axes a bit so we can see this better okay that's better so we added a grid pt. grid and pt. axis sets the Xmen xmax y Min y Max of our plot so we see a scatter plot of our data okay look at those points there blue blue blue blue like a t blue blue blue blue like a t and red red red red here that looks pretty good to me so those are our points now let's do our training Loop so what our training Loop's going to do is grab one of these points randomly see what the network output is for it and use that as part of getting the derivative of the cost and bringing that derivative back to our parameters and updating them to minimize the cost and get the prediction you know more importantly get the prediction of the network closer to what we want it to be at this point so we're getting a random point now let's feed it through the neural network so I'm going to make a variable called Z and this is going to be the weighted average of the points features so point0 * W1 uh the weighted average of the points features and the bias so 0.1 is the next feature time W2 plus b so we can look at all these Z's we're getting for each point all these Z's for each of these flowers now I'm going to make another variable called H and that is the application of our activation function on onz so now we can print all these H's and they should be between zero and one so we're getting all these high values interesting now our cost is equal to the Target which is point the the third part of the point right with those zeros and ones we put so that's our Target so the cost is equal to Yus t or H we called it in this case minus t it's really the net Network's output you know the prediction I should use PR because I Ed that in previous videos so it's PR minus Target squar and the way we can do that is numpy do square square that okay so that's our basic cost the prediction minus the target squared is our basic cost now I can print that for each point and we can see some points actually have uh really low cost and I bet you they're red points because our outputs for H uh or our prediction were all really high right they're all close to one so all the red points are probably getting um a low cost let's just prove that by printing our points information and the cost so let's see for these blue flowers see the zero here at the end for these blue ones they have a high cost and for these red ones they have a low cost and that's because the predictions were all really high just given the weights and the features right now but anyways it looks like it's working now we want to take the derivative of the cost with respect to each of our parameters so we're going to have to work our way through the model and look to the previous video to see you know why I'm doing that and how I'm doing that first we want the derivative of the cost with respect to the prediction and that's the power rule so really we have this minus this squared with a two out here so we bring the two out in front and then it's times whatever was in here pred minus Target so we have the Der of the cost with respect to the prediction and so we're right here and we want to take the derivative of the prediction with respect to Z so that's sigmoid Prime so this derivative of the prediction with respect to Z is sigmoid Prime evaluated at Z we're taking the derivative of this with respect to Z so it's the derivative of sigmoid which is sigmoid Prime we defined over here this blue function and it matters where we were you know if our activation was really high out here the derivative is going to be really small and if our activation is over here where the slope is high the D the derivative is going to be high so remember it's evaluated at Z so now we're at Z here and we're going to go to W1 let's say we're going to take the derivative uh with respect to W1 so we're going to say DZ dw1 is equal to 0 remember whatever uh is multiplying W1 is the derivative of this now we can also because we've worked our way through the graph here right so we can say DZ dw2 let's do that next and that's point uh one because that's multiplying W2 and DZ DB you know derivative of Z with respect to B is just one because we have a one time B here you can picture Okay so we've kind of worked our way through our model and now we can chain these things together this is the chain rule after all and get our derivatives of the cost with respect to each one of our parameters so we'll say d cost dw1 well that's going to be the cost with respect to the prediction times the prediction with respect to Z times Z with respect to W1 and you can almost like ignore all this middle stuff and you see we get dcost dw1 but this is uh of course you need all that but just to check yourself you can you know see is this stuff matching up D cost D then you should see PR and then DZ and then you should see Z so whenever you see the end here you should see at the beginning here so we have our whole chain to get us from the cost to W1 we're going to do the cost with respect to W2 and that's going to be the same first part we could uh store this somewhere let's do that so that's going to be de cost DZ and you know we're bringing it through the pr so we've got dcost d d DZ now all we need is this in here so dcost DZ DZ dw1 now we'll do DZ dw2 and D cost DB is d c DZ DB so those are our partial derivatives of the cost with respect to each of the parameters now we want to subtract a small fraction of them from our parameters so we'll make a learning rate make it a small number we can tweak that later and we'll say W1 is equal to W1 minus the learning rate time dcost dw1 and we'll do the same thing for the others W2 D cost dw2 and B is equal to b d cost DB let's print the cost as we go but let's only print it if I mod 1000 is equal to zero um this gives us the remainder of the division of I over 1,00 and so if the remainder is zero I is evenly divisible by a th000 and so we'll get uh updates of our cost less frequently than 10,000 times so I'm going to do 10,000 times we should get 10 updates now it doesn't really look like our cost is going down that much so let's see why that is Target we have our Target let's make our learning rate higher okay that looks a little better uh but it's kind of hard to see if this is really decreasing you know we want to see this decrease and it's going to be a bit noisy because this is the cost for a single point when really we care about the cost over all of our points but first let's just keep track we'll say costs let's keep track of all the costs that we uh record and I'm going to keep track of them all why not so we're going to say cost do append the current cost and now we can plot these at the end so we can say PLT do plot cost and whoa that does not look that good it does not look like it's going down very much so let's see what we're doing wrong here bump up the learning rate I mean that looks like it's doing something you know I'm not liking how it's not going down that much I guess that's okay do 100,000 all right I mean it looks like our cost is decreasing over these 100,000 iterations which just seems like so much for this problem but what are you going to do we're ending up around here 0.001 so our cost is pretty small you know let's uh let's change these around 2 go back to 10 ,000 let's look at the cost for all of our data points uh as we update our parameters now we can append the cost for all of the points let's see why that so weird it's a little weird oh I know what's going on we need to just reinitialize our weights uh every time we run they probably at some extreme value so let's bring our weights down here that's probably why okay that's very strange okay that's just weird let's do a really small learning rate that's even crazier look a little more that looks that looks awful okay after a lot of well not a lot but a little bit of playing around uh I went back to this JavaScript solution that I made and I just use the same hyper parameters they're called stuff like the learning rate and number of iterations and I'm getting the same parameters that we got here so I'm going to use a learning rate of 0.2 and do 50,000 iterations so 2 learning rate and 50,000 iterations and every time I is equally divisible by 100 I'm going to get a little summary of the cost so for each of the points I run it through the network here and get out a prediction and see what it should have been get the squared difference and that goes into this cost sum variable and then I just make that an average by dividing by the points that went into it and I keep track of those and if we look at the graph of those you can see the cost decreasing here and we still get these kind of spikes and I have a feeling that these spikes of high cost are coming from points that are right here on this decision boundary if you can see you know in the JavaScript solution to these points will still have high error like this red point the prediction is 76 so you're going to get 1 -76 which is24 squared which is 005 and if we look over here that's going to be one of these little guys here 0.05 so we're down here so it's all these tiny little spikes right these are caused by these points being so close to this decision boundary over here our prediction is point about about .4 so we have 1 -4 squared and that gets us actually sorry the target for that is um we'll do 04 minus 0 so that's .16 so if we look over here you know that's one of these spikes over here so these spikes are caused by these on this decision boundary and we could continue to push our parameters like crazy until the cost went down and down and down but you know if the prediction is lower than 0.5 for this point I'm happy enough with that and if it's more than5 for this point I'm happy with that so let's uh plot all the not plot let's print all the predictions for each of the points so for this point which was a red flower I call the flowers points and the points flowers so bear with me on that but for this flower which was a red one you see the target was one the model is saying 64 so it's above 0.5 so I would say that's good for this one it's blue and the model is really really low so that's good it really thinks it's a blue flower for this one it's uh a red flower and it's really high so that's good blue low red high blue really low and and red high blue low awesome so for each of our points it got it right so now we can do the mystery point so that's mystery flower see mystery Point mystery flower same thing we'll get the prediction for that and just type it here and whoa red it's really high close to one so our computer is telling us that this is red let's make a little uh thing here um OS and we can do OS dot what do we want to do Os or CIS we want to run something on the terminal os. system that's it thank God for stack Overflow we want want to import OS so we can do some stuff with our operating system and let's say say hi hi let's try Samantha hello my name is Samantha I am an American English voice whoa you're very loud hello my name is Samantha I am an American English I'm an American English okay we can make the computer say stuff now with this command so let's make a function that's like which which which flower and it takes the what is it length or width length and width width and it makes a prediction uh which is done right here so it we'll say z uh takes the length here W2 gets the width B the prediction here now we'll say if the prediction is less than 0.5 it's a blue flower say blue else say red so now we have our handy function here which flower let's say it's a one one it's a weird shape of pedal but let's see blue W it's blue okay let's work off work off of our data so this is more fun I know we already saw all the predictions but let's just try this 2 one the flower 21 what is that blue Ah that's right how about 41.5 red whoa it's pretty cool actually 5.5 and one all that's some good measurements red oh it's red Wow Let's try the mystery flower 4.5 and one red wow supposed to be red good job computer good job let's try two in 0.5 what is that blue that's right uh what about the anti flow negative one and negative one blue oh the anti flower is blue how abouta 100 the mega anti flower the black hole antiblack blue blue oh no we got to stack we got to over not stack Overflow but a a um overflow because our input to the exponential was so huge interesting so the mega anti-l is giving us values in our computer's brain that it are so small or so large that it can't store them uh let's try the mega flow red let's try the pie flow o that's actually quite a mystery let's see if we can figure it out ourselves the pie flower Pie by pie oh that's red pie flower is red red how about the 90° flower blue oh the 90° flowers blue how about the e flower is that exp the E Yeah exponential okay what about the uh in Infinity flower red it's red holy holy moly wow what's the prediction for the wow it's one that's crazy what's the weighted average of infinity time oh it's Infinity okay so the infinity flower is red what other flowers the zero flower the point flower blue oh it's blue okay all right so our computer you can see has very interesting powers of telling us the types of flowers that um could not possibly exist either in our universe but anyways this is how I would go about solving this by hand uh in a later video let's look at how to solve this with like tensor flow or Caris or something that can spit out a model for us and we don't have to worry about taking the derivative and um training it ourselves we can use all these fancy optimizers and momentum in our gradient and uh batch training you know train on a couple of different flowers and then apply the update all that good stuff so anyways well let's look at that in a later video but yeah I just wanted to solve the problem using python for completion uh for anybody who wants to use Python to solve it and do it in an kind of interactive way here uh so yeah thanks for watching and I'll see you guys in the next video guys and gals in the next video\n"