Deep L-Layer Neural Network (C1W4L01)

The Art and Science of Implementing Deep Neural Networks

In this week's video, we'll be taking the ideas learned from previous videos and putting them together to implement a deep neural network. By the end of this video, you should feel confident in building your own deep neural network.

So, what is a deep neural network? You've likely seen pictures of a neural network with multiple hidden layers, but today we're going to explore the concept of depth in neural networks. In general, a neural network's depth refers to the number of layers it has. A single hidden layer is often referred to as a two-layer neural network, whereas logistic regression is considered a one-layer neural network.

However, over the years, the machine learning community has come to realize that very deep neural networks can learn functions that shallower models are unable to. This realization led to an increase in the use of multi-layer perceptrons (MLPs), which are a type of feedforward neural network. The number of hidden layers is another hyperparameter that you could try a variety of values for, and evaluating on holdout cross-validation data or all your development set can help determine whether it's beneficial to add more layers.

Notation plays an essential role in describing deep neural networks. We'll use the notation L to denote the number of layers in the network, with each layer having a specific number of units (or nodes) denoted by n^L. For example, if we have a four-layer network with 5 units in the first hidden layer, 5 units in the second hidden layer, 3 units in the third hidden layer, and 1 unit in the final output layer, the notation would be L=4, n^1=5, n^2=5, n^3=3, and n^4=1.

In this type of network, we'll use A to denote the activation function applied to ZL (the weighted sum of the inputs in that layer), and WL to denote the weights for computing the values in that layer. The input features are often denoted by X, which is also the activations of layer 0. Finally, the activation of the final layer, a^[], is equal to the predicted output, y^hat.

Forward propagation in this type of network involves computing the weighted sum of the inputs in each layer (ZL), applying an activation function to the result (A), and then using the weights for that layer to compute the values for the next layer. This process continues until we reach the final layer, where the output is computed.

As a side note, it's worth mentioning that introducing notation can seem daunting at first, but once you're familiar with these symbols, they become second nature. If you ever forget what some symbol means, we've also posted a notation sheet or guide on the course website for your reference.

"WEBVTTKind: captionsLanguage: enwelcome to the fourth week of this course by now you've seen forward propagation and back propagation in the context of a neural network with a single hidden layer as well as logistic regression and you've learned about vectorization and when it's important initialize the weights randomly if you've done the past company's homework we've also implemented and seen some of these ideas work for yourself so by now you've actually seen most of the ideas you need to implement a deep neural network what we're going to do in this week is take those ideas and put them together so that you'll be able to implement your own deep neural network because the following exercise is longer and just has a bit more work going to keep the video so this week short as you get through the videos a little bit more quickly and then have more time to do a significant programming exercise at the end which I hope will leave you having built a deep neural network that you feel proud of so what is a deep neural network you've seen this picture for a literacy regression and you've also seen new networks sort of a single hidden layer so here is an example of a neural network with two hidden layers and in your network with five hidden layers we should say that logistic regression is a very shallow model whereas this model here is a much deeper model and shallow versus depth is a matter of degree so neural network of a single hidden layer this would be a two layer neural network remember when we count layers in neural network we don't count the input layer we just count the hidden layers as was the output layer so this would be a two layer neural network is so quite shallow but not as shallow as logistic regression technically logistic regression is a you know one layer neural network but over the last several years dai on the machine learning community has realized that there are functions that very deep neural networks can learn that shallower models are often unable to although for any given problem it might be hard to predict in advance exactly how deep a neural network you will want so it would be reasonable to try logistic regression one and two hidden layers and view the number of hidden layers is another hyper parameter that you could try a variety of values of and evaluate on holdout cross validation data or all your development set say more about that later as well let's now go through the notation we're used to describe deep neural networks here is a one two three four layer neural network with three thin layers and the number of units in these hidden layers are I guess five five three almond and it's one output unit so the notation we're going to use is going to use capital L to denote the number of layers in the network so in this case L is equal to four and so that's the number of layers and we're going to use n superscript L to denote the number of notes or the number of units in they are lowercase L so if we index this the input as layer 0 this is layer 1 this is layer 2 this is layer 3 and this is layer 4 then we have that for example n 1 that would be this the first isn't layer would be equal to 5 because we have 500 units there for this one without that n 2 the number of units in the second sitting there is also equal to 5 n 3 is equal to 3 and n 4 which is n capital L this number of units is this number of output units is equal to 1 because here our capital L is equal to 4 and we're also going to have here therefore the input layer n 0 is just equal to n X is equal to 3 okay so that's the notation we use to describe the number of nodes we have in different layers so each layer L also also going to use a L to denote D observations in there l so we'll see later that in for propagation you end up computing al as the activation G applied to ZL and then perhaps the activations index by the layer l as well and then we'll use WL to denote you know the weights for computing the values VL in the ARL and similarly VL that's used to compute ZL finally just to wrap up on the notation the input features are called X but X is also the activations of layer 0 so a 0 is equal to X and the activation of the final layer a capital L is equal to Y hat so a superscript square bracket capital L is equal to the predicted output to prediction y hats of the neural network so you now know what a deep neural network looks like as well as the notation will use to this drive and to compute with teeth networks I never introduced a lot of notation in this video but if you ever forget what some symbol means we've also posted on the course website a notation sheet or a notation guide that you can use to look up what these different symbols means mix elect to describe what forward propagation in this type of network look like let's go into the next videowelcome to the fourth week of this course by now you've seen forward propagation and back propagation in the context of a neural network with a single hidden layer as well as logistic regression and you've learned about vectorization and when it's important initialize the weights randomly if you've done the past company's homework we've also implemented and seen some of these ideas work for yourself so by now you've actually seen most of the ideas you need to implement a deep neural network what we're going to do in this week is take those ideas and put them together so that you'll be able to implement your own deep neural network because the following exercise is longer and just has a bit more work going to keep the video so this week short as you get through the videos a little bit more quickly and then have more time to do a significant programming exercise at the end which I hope will leave you having built a deep neural network that you feel proud of so what is a deep neural network you've seen this picture for a literacy regression and you've also seen new networks sort of a single hidden layer so here is an example of a neural network with two hidden layers and in your network with five hidden layers we should say that logistic regression is a very shallow model whereas this model here is a much deeper model and shallow versus depth is a matter of degree so neural network of a single hidden layer this would be a two layer neural network remember when we count layers in neural network we don't count the input layer we just count the hidden layers as was the output layer so this would be a two layer neural network is so quite shallow but not as shallow as logistic regression technically logistic regression is a you know one layer neural network but over the last several years dai on the machine learning community has realized that there are functions that very deep neural networks can learn that shallower models are often unable to although for any given problem it might be hard to predict in advance exactly how deep a neural network you will want so it would be reasonable to try logistic regression one and two hidden layers and view the number of hidden layers is another hyper parameter that you could try a variety of values of and evaluate on holdout cross validation data or all your development set say more about that later as well let's now go through the notation we're used to describe deep neural networks here is a one two three four layer neural network with three thin layers and the number of units in these hidden layers are I guess five five three almond and it's one output unit so the notation we're going to use is going to use capital L to denote the number of layers in the network so in this case L is equal to four and so that's the number of layers and we're going to use n superscript L to denote the number of notes or the number of units in they are lowercase L so if we index this the input as layer 0 this is layer 1 this is layer 2 this is layer 3 and this is layer 4 then we have that for example n 1 that would be this the first isn't layer would be equal to 5 because we have 500 units there for this one without that n 2 the number of units in the second sitting there is also equal to 5 n 3 is equal to 3 and n 4 which is n capital L this number of units is this number of output units is equal to 1 because here our capital L is equal to 4 and we're also going to have here therefore the input layer n 0 is just equal to n X is equal to 3 okay so that's the notation we use to describe the number of nodes we have in different layers so each layer L also also going to use a L to denote D observations in there l so we'll see later that in for propagation you end up computing al as the activation G applied to ZL and then perhaps the activations index by the layer l as well and then we'll use WL to denote you know the weights for computing the values VL in the ARL and similarly VL that's used to compute ZL finally just to wrap up on the notation the input features are called X but X is also the activations of layer 0 so a 0 is equal to X and the activation of the final layer a capital L is equal to Y hat so a superscript square bracket capital L is equal to the predicted output to prediction y hats of the neural network so you now know what a deep neural network looks like as well as the notation will use to this drive and to compute with teeth networks I never introduced a lot of notation in this video but if you ever forget what some symbol means we've also posted on the course website a notation sheet or a notation guide that you can use to look up what these different symbols means mix elect to describe what forward propagation in this type of network look like let's go into the next video\n"