Computing Neural Network Output (C1W3L03)

Vectorizing a Neural Network: Understanding the Basics and Implementing Logistic Regression Units

These are individuals of these into a column vector when we're vectorizing one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer we'll stack them vertically so that's why when you have V 1 1 2 0 1 4 those correspond to four different nodes in the hidden layer and so we stack these four numbers vertically to form the vector Z 1 and reduce one more piece of notation this 4 by 3 matrix here which we obtained by stacking the lower case you know W 1 1 W 1 2 and so on we're going to call this matrix W Capital One and similarly this vector or going to call B superscript 1 square bracket and so this is a four point one vector so now we've computed Z using this vector matrix notation.

The Last Thing We Need To Compute Are These Values of A

probably won't surprise you to see that we're going to define a 1 as just stacking together those activation values a11 to a14 so just take these four values and stack them together in a vector called a1 and this is going to be sigmoid of z1 where there's no husband implantation of the sigmoid function that takes in the four elements of Z and applies the sigmoid function element wise to it so just a we figured out that Z 1 is equal to w1 times the vector X plus the vector B 1 and a 1 is sigmoid times Z 1 let's just copy this to the next slide and what we see is that for the first layer of the neural network given an input X we have that Z 1 is equal to W 1 times X plus B 1 and a 1 is sigma

For The First Layer Of The Neural Network Given An Input X We Have That Z 1 Is Equal To W 1 Times X Plus B 1 And A 1 Is Sigma We Took Z 1 And the Dimensions of this are 4 by 1 equals this is a 4 by 3 matrix times a 3 by 1 vector plus a on 4 by 1 vector B and this is 4 by 1 same dimensions and remember that we said X is equal to a 0 right just like Y hat is also equal to a 2 so if you want you can actually take this X and replace it with a 0 since a 0 is if you want it as an alias for the vector of input features X

Now Through A Similar Derivation You Can Figure Out That The Representation For The Next Layer Can Also Be Written Similarly Where What The Output Layer Does Is It Has Associated With It So The Parameters W 2 and B 2 So W 2 In This Case Is Going To Be a 1 by 4 matrix And B 2 Is Just a Real Number as 1 by 1

The Representation For The Next Layer Can Also Be Written Similarly Where What The Output Layer Does Is It Has Associated With It So The Parameters W 2 and B 2 So W 2 In This Case Is Going To Be a 1 by 4 matrix And B 2 Is Just a Real Number as 1 by 1 And so V 2 is going to be a real numbers right as a 1 by 1 matrix is going to be a 1 by 4 thing times a was 4 by 1 plus B 2 is 1 by 1 and so this gives you just a real number

A Recap For Logistic Regression To Implement The Output Or The Influence Prediction You Compute Z Equals W Transpose X Plus B And A Y Hat Equals A Equals Sigma of z When You Have A New Network With One Hidden Layer What You Need To Implement Two Computers Output Is Just These Four Equations

For logistic regression to implement the output or the influence prediction you compute Z equals W transpose X plus B and a y hat equals a equals sigma of z when you have a new network with one hidden layer what you need to implement two computers output is just these four equations and you can think of this as a vectorized implementation of computing the output of first logistic regression units in the hidden layer that's what this does

And Then This Which Is Regression In The Output Layer Which Is What This Does I Hope This Description Made Sense But Takeaway Is To Compute The Output Of This Neural Network All You Need Is Those Four Lines Of Code

"WEBVTTKind: captionsLanguage: enin the last video you saw what a single hidden layer neural network looks like in this video let's go through the details of exactly how this neural network computers outputs what you see is that is like logistic regression but repeater of all the times let's take a look so this is what's a two layer neural network looks let's go more DB into exactly what this new network compute now was set before that logistic regression the circle in logistic regression really represents two steps of computation rows you compute Z as follows and in second you compute the activation as a sigmoid function of Z so in your network just does this a lot more times let's start by focusing on just one of the nodes in the hidden layer and let's look at the first node in the hidden layer so I've grayed out the other nodes for now so similar to logistic regression on the left is node in a hidden layer that's two steps of computation right the first step and think of as the left half of this node it computes Z equals W transpose X plus B and the notation were used is um these are all quantities associated with the first hidden there so that's why we have a bunch of square brackets there and this is the first node in the hidden layer so that's why we have the subscript one over there so first it does that and then the second step is it computes a 1 1 equals say point of Z 1 1 like so so for both Zn a the notational convention is that on a oh I DL here in superscript square backers refers to layer number and the I subscript here refers to the nodes in that layer so the node will be looking at is layer 1 that is a hidden layer node 1 so that's why the superscript and subscript were on both 1 1 so that little circle that first node in a neural network represents carrying out these two steps of computation now let's look at the second node in your network the second node in a hidden layer comes in your network similar to the logistic regression unit on the left this little circle represents two steps of computation the first step is a confusing Z this is still layer 1 the now is the second note equals W Tron's x+ v 2 and then a & 2 equals sigmoid of z12 and again feel free to pause the video if you want that you can double check that B superscript and subscript notation is consistent with what we have written here above in purple so we'll talk through the first two hidden units in the neural network on hidden units 3 & 4 also represents some computations so now let me take this pair of equations and this pair of equations and let's copy them to the next line so here's our network and here's the first and there's the second equations they've worked on previously for the first and the second hidden units if you then go through and write out the corresponding equations for the third and fourth hidden units you get the following and those make sure this notation is clear this is the vector W 1 1 this is a vector transpose times X so that's what the superscript T there represents this vector transpose now as you might have guessed if you're actually implementing in your network doing this with a for loop seems really inefficient so what we're going to do is take these four equations and vectorize so I'm going to start by showing how to compute Z as a vector and it turns out you could do it as follows let me take these WS and stack them into a matrix then you have W 1 1 transpose so that's a row vector of the column vector transpose gives you a row vector and W 1 2 transpose W 1 3 transpose of V 1 4 transpose and so this by stacking goes from for W vectors together you end up with a matrix so another way to think of this is that we have for logistic regression unions there and each of the logistic regression you know is has a corresponding parameter vector W and by stacking those four vectors together you end up with this 4 by 3 matrix so if you then take this matrix and multiply it by your input features x1 x2 x3 you end up with by our matrix multiplication works you end up with w1 1 transpose X W 1 this will be 2 1 transpose X we 1 transpose X wo 1 transpose X and then now let's not forget the bees so we now add to this the vector e1 1 b12 b13 in 1/4 so that's basically this then this gives b11 b12 b13 b14 and so you see that each of the 4 rows of this outcome correspond exactly to each of these 4 rows each of these four quantities that we had above so in other words we've just shown that this thing is therefore equal to V 1 1 V 1 to V 1 V V 1 core right as defined here and maybe not surprisingly we're going to call this whole thing the vector V 1 which is taken by stacking up these are individuals of these into a column vector when we're vectorizing one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer we'll stack them vertically so that's why when you have V 1 1 2 0 1 4 those correspond to four different nodes in the hidden layer and so we stack these four numbers vertically to form the vector Z 1 and reduce one more piece of notation this 4 by 3 matrix here which we obtained by stacking the lower case you know W 1 1 W 1 2 and so on we're going to call this matrix W Capital One and similarly this vector or going to call B superscript 1 square bracket and so this is a four point one vector so now we've computed Z using this vector matrix notation the last thing we need to do is also compute these values of a and so probably won't surprise you to see that we're going to define a 1 as just stacking together those activation values a11 to a14 so just take these four values and stack them together in a vector called a1 and this is going to be sigmoid of z1 where there's no husband implantation of the sigmoid function that takes in the four elements of Z and applies the sigmoid function element wise to it so just a we figured out that Z 1 is equal to w1 times the vector X plus the vector B 1 and a 1 is sigmoid times Z 1 let's just copy this to the next slide and what we see is that for the first layer of the neural network given an input X we have that Z 1 is equal to W 1 times X plus B 1 and a 1 is Sigma we took Z 1 and the dimensions of this are 4 by 1 equals this is a 4 by 3 matrix times a 3 by 1 vector plus a on 4 by 1 vector B and this is 4 by 1 same dimensions and remember that we said X is equal to a 0 right just like Y hat is also equal to a 2 so if you want you can actually take this X and replace it with a 0 since a 0 is if you want it as an alias for the vector of input features X now through a similar derivation you can figure out that the representation for the next layer can also be written similarly where what the output layer does is it has associated with it so the parameters W 2 and B 2 so W 2 in this case is going to be a 1 by 4 matrix and B 2 is just a real number as 1 by 1 and so V 2 is going to be a real numbers right as a 1 by 1 matrix is going to be a 1 by 4 thing times a was 4 by 1 plus B 2 is 1 by 1 and so this gives you just a real number and if you think of this loss output unit as just being analogous to logistic regression which had parameters W and B W really plays in lagless real to W 2 transpose or W 2's really W transpose and B is equal to V 2 right said were to you know cover up the left of this network and ignore all that for now then this is just this last output unit is a lot like logistic regression except that instead of writing the parameters as WMV we're writing them as W 2 and V 2 with dimensions one by four and one by one so just a recap for logistic regression to implement the output or the influence prediction you compute Z equals W transpose X plus B and a y hat equals a equals sigmoid of z when you have a new network with one hidden layer what you need to implement two computers output is just these four equations and you can think of this as a vectorized implementation of computing the output of first these for logistic regression units in the hidden layer that's what this does and then this which is regression in the output layer which is what this does I hope this description made sense but takeaway is to compute the output of this neural network all you need is those four lines of code so now you've seen how given a single input feature vector at you can with four lines of code compute the outputs of this new network um similar to what we did for the gist regression will also want to vectorize across multiple training examples and we'll see that by stacking up training examples in different columns in the matrix or just slight modification to this you also similar to what you saw in which is regression be able to compute the output of this neural network not just on one example at a time belong your say your entire inning set at a time so let's see the details of that in the next videoin the last video you saw what a single hidden layer neural network looks like in this video let's go through the details of exactly how this neural network computers outputs what you see is that is like logistic regression but repeater of all the times let's take a look so this is what's a two layer neural network looks let's go more DB into exactly what this new network compute now was set before that logistic regression the circle in logistic regression really represents two steps of computation rows you compute Z as follows and in second you compute the activation as a sigmoid function of Z so in your network just does this a lot more times let's start by focusing on just one of the nodes in the hidden layer and let's look at the first node in the hidden layer so I've grayed out the other nodes for now so similar to logistic regression on the left is node in a hidden layer that's two steps of computation right the first step and think of as the left half of this node it computes Z equals W transpose X plus B and the notation were used is um these are all quantities associated with the first hidden there so that's why we have a bunch of square brackets there and this is the first node in the hidden layer so that's why we have the subscript one over there so first it does that and then the second step is it computes a 1 1 equals say point of Z 1 1 like so so for both Zn a the notational convention is that on a oh I DL here in superscript square backers refers to layer number and the I subscript here refers to the nodes in that layer so the node will be looking at is layer 1 that is a hidden layer node 1 so that's why the superscript and subscript were on both 1 1 so that little circle that first node in a neural network represents carrying out these two steps of computation now let's look at the second node in your network the second node in a hidden layer comes in your network similar to the logistic regression unit on the left this little circle represents two steps of computation the first step is a confusing Z this is still layer 1 the now is the second note equals W Tron's x+ v 2 and then a & 2 equals sigmoid of z12 and again feel free to pause the video if you want that you can double check that B superscript and subscript notation is consistent with what we have written here above in purple so we'll talk through the first two hidden units in the neural network on hidden units 3 & 4 also represents some computations so now let me take this pair of equations and this pair of equations and let's copy them to the next line so here's our network and here's the first and there's the second equations they've worked on previously for the first and the second hidden units if you then go through and write out the corresponding equations for the third and fourth hidden units you get the following and those make sure this notation is clear this is the vector W 1 1 this is a vector transpose times X so that's what the superscript T there represents this vector transpose now as you might have guessed if you're actually implementing in your network doing this with a for loop seems really inefficient so what we're going to do is take these four equations and vectorize so I'm going to start by showing how to compute Z as a vector and it turns out you could do it as follows let me take these WS and stack them into a matrix then you have W 1 1 transpose so that's a row vector of the column vector transpose gives you a row vector and W 1 2 transpose W 1 3 transpose of V 1 4 transpose and so this by stacking goes from for W vectors together you end up with a matrix so another way to think of this is that we have for logistic regression unions there and each of the logistic regression you know is has a corresponding parameter vector W and by stacking those four vectors together you end up with this 4 by 3 matrix so if you then take this matrix and multiply it by your input features x1 x2 x3 you end up with by our matrix multiplication works you end up with w1 1 transpose X W 1 this will be 2 1 transpose X we 1 transpose X wo 1 transpose X and then now let's not forget the bees so we now add to this the vector e1 1 b12 b13 in 1/4 so that's basically this then this gives b11 b12 b13 b14 and so you see that each of the 4 rows of this outcome correspond exactly to each of these 4 rows each of these four quantities that we had above so in other words we've just shown that this thing is therefore equal to V 1 1 V 1 to V 1 V V 1 core right as defined here and maybe not surprisingly we're going to call this whole thing the vector V 1 which is taken by stacking up these are individuals of these into a column vector when we're vectorizing one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer we'll stack them vertically so that's why when you have V 1 1 2 0 1 4 those correspond to four different nodes in the hidden layer and so we stack these four numbers vertically to form the vector Z 1 and reduce one more piece of notation this 4 by 3 matrix here which we obtained by stacking the lower case you know W 1 1 W 1 2 and so on we're going to call this matrix W Capital One and similarly this vector or going to call B superscript 1 square bracket and so this is a four point one vector so now we've computed Z using this vector matrix notation the last thing we need to do is also compute these values of a and so probably won't surprise you to see that we're going to define a 1 as just stacking together those activation values a11 to a14 so just take these four values and stack them together in a vector called a1 and this is going to be sigmoid of z1 where there's no husband implantation of the sigmoid function that takes in the four elements of Z and applies the sigmoid function element wise to it so just a we figured out that Z 1 is equal to w1 times the vector X plus the vector B 1 and a 1 is sigmoid times Z 1 let's just copy this to the next slide and what we see is that for the first layer of the neural network given an input X we have that Z 1 is equal to W 1 times X plus B 1 and a 1 is Sigma we took Z 1 and the dimensions of this are 4 by 1 equals this is a 4 by 3 matrix times a 3 by 1 vector plus a on 4 by 1 vector B and this is 4 by 1 same dimensions and remember that we said X is equal to a 0 right just like Y hat is also equal to a 2 so if you want you can actually take this X and replace it with a 0 since a 0 is if you want it as an alias for the vector of input features X now through a similar derivation you can figure out that the representation for the next layer can also be written similarly where what the output layer does is it has associated with it so the parameters W 2 and B 2 so W 2 in this case is going to be a 1 by 4 matrix and B 2 is just a real number as 1 by 1 and so V 2 is going to be a real numbers right as a 1 by 1 matrix is going to be a 1 by 4 thing times a was 4 by 1 plus B 2 is 1 by 1 and so this gives you just a real number and if you think of this loss output unit as just being analogous to logistic regression which had parameters W and B W really plays in lagless real to W 2 transpose or W 2's really W transpose and B is equal to V 2 right said were to you know cover up the left of this network and ignore all that for now then this is just this last output unit is a lot like logistic regression except that instead of writing the parameters as WMV we're writing them as W 2 and V 2 with dimensions one by four and one by one so just a recap for logistic regression to implement the output or the influence prediction you compute Z equals W transpose X plus B and a y hat equals a equals sigmoid of z when you have a new network with one hidden layer what you need to implement two computers output is just these four equations and you can think of this as a vectorized implementation of computing the output of first these for logistic regression units in the hidden layer that's what this does and then this which is regression in the output layer which is what this does I hope this description made sense but takeaway is to compute the output of this neural network all you need is those four lines of code so now you've seen how given a single input feature vector at you can with four lines of code compute the outputs of this new network um similar to what we did for the gist regression will also want to vectorize across multiple training examples and we'll see that by stacking up training examples in different columns in the matrix or just slight modification to this you also similar to what you saw in which is regression be able to compute the output of this neural network not just on one example at a time belong your say your entire inning set at a time so let's see the details of that in the next video\n"