Parameters vs Hyperparameters (C1W4L07)

The Implications of Hyperparameters in Deep Learning

Hyperparameters are a crucial aspect of deep learning, and their impact cannot be overstated. Most of us have had experience with hyperparameters, where we often find ourselves dealing with an excessive number of parameters that need to be tuned. The problem is that most of us used to be sloppy when it came to calling these parameters, simply referring to them as "alpha" without fully understanding their significance. However, in reality, alpha represents the real programmer or someone who takes a consistent approach to defining these hyperparameters.

When training a deep neural network for an application, one often finds themselves facing a daunting task of determining the optimal values for these hyperparameters. This can be attributed to the sheer number of possible settings that need to be explored. As a result, applying deep learning today is a very empirical process where researchers and practitioners are forced to rely on trial and error to find the best solutions. This often involves trying out different values for hyperparameters, such as the learning rate, and observing how they impact the cost function J.

One common phenomenon that researchers observe is the concept of "alpha" - the number of iterations required for convergence. However, even this parameter can be tricky to determine, especially when it comes to tuning its value. In many cases, researchers find themselves having to try out different values for alpha and assess their impact on the cost function J. If they discover that a particular value is yielding suboptimal results, they may choose to adjust their approach accordingly.

The process of trial and error can be quite tedious, especially when dealing with multiple hyperparameters. However, it is often necessary to find the optimal solutions for these parameters in order to achieve good performance on a given problem. As a result, researchers have developed various strategies for systematically exploring the space of hyperparameters. These approaches often involve using techniques such as cross-validation and grid search to identify the best values for each parameter.

In addition to trial and error, another challenge that researchers face is adapting their approach to new problems or domains. For instance, when transitioning from one discipline to another, researchers may find themselves struggling to apply their existing knowledge of hyperparameters. In some cases, intuitions about hyperparameters can carry over, while in others, they may not be as relevant. This highlights the importance of taking a systematic approach to exploring hyperparameter spaces and being willing to adjust one's strategy based on the results.

Furthermore, researchers have observed that even when working on a single problem for an extended period, it is not uncommon for the optimal values of hyperparameters to change over time. This can be attributed to various factors such as changes in computer infrastructure or advancements in algorithms. As a result, one potential approach is to establish regular check-ins with oneself to assess whether there are better solutions available.

Ultimately, deep learning research has made significant progress in recent years, and researchers have developed numerous techniques for systematically exploring hyperparameter spaces. However, the challenge of determining optimal values for these parameters remains an ongoing problem that requires continued attention and innovation. By adopting a systematic approach and being willing to adapt to changing circumstances, researchers can increase their chances of success when applying deep learning techniques to complex problems.

Despite the challenges associated with hyperparameters, research has made significant progress in recent years. One potential area of future advancement is developing more sophisticated tools for systematically exploring hyperparameter spaces. By leveraging advances in machine learning and computer science, researchers may be able to develop more efficient methods for identifying optimal values for these parameters. This could involve the use of techniques such as Bayesian optimization or evolutionary algorithms.

In conclusion, hyperparameters play a crucial role in deep learning, and their impact cannot be overstated. While the process of determining optimal values for these parameters can be challenging, researchers have developed various strategies for systematically exploring hyperparameter spaces. By adopting a systematic approach and being willing to adapt to changing circumstances, researchers can increase their chances of success when applying deep learning techniques to complex problems.

Applying Deep Learning: An Empirical Approach

One common phenomenon that researchers observe is the concept of "hyperparameters" - parameters that are set before training a model but not learned during training. These hyperparameters are used to control various aspects of the training process, such as learning rate, number of hidden units, and regularization strength.

When applying deep learning techniques to a new problem, one often finds themselves facing an overwhelming number of possible settings for these hyperparameters. This can be attributed to the sheer number of parameters that need to be tuned, making it challenging to determine the optimal values. As a result, applying deep learning today is a very empirical process where researchers and practitioners are forced to rely on trial and error to find the best solutions.

For instance, let's say we're working on an online advertising problem, and we want to optimize the performance of our model using deep learning techniques. We may have an idea about the best value for the learning rate, but we don't know whether that's actually true. In this case, we might try out different values for the learning rate and see which one yields better results.

Another challenge that researchers face is adapting their approach to new problems or domains. For instance, when transitioning from one discipline to another, researchers may find themselves struggling to apply their existing knowledge of hyperparameters. In some cases, intuitions about hyperparameters can carry over, while in others, they may not be as relevant.

To address this challenge, researchers have developed various strategies for systematically exploring hyperparameter spaces. One common approach is to use techniques such as cross-validation and grid search to identify the best values for each parameter. Cross-validation involves splitting the available data into training and testing sets, and then evaluating the performance of different models on the test set.

Grid search, on the other hand, involves systematically trying out different combinations of hyperparameters and evaluating their impact on the model's performance. By using these techniques, researchers can increase their chances of finding optimal values for their hyperparameters and achieving better results.

To address this challenge, researchers have developed various strategies for adapting to new problems or domains. One common approach is to take a systematic approach to exploring hyperparameter spaces and being willing to adjust one's strategy based on the results. This may involve using techniques such as Bayesian optimization or evolutionary algorithms to identify optimal values for hyperparameters.

The Importance of Hyperparameters in Deep Learning

Hyperparameters are a crucial aspect of deep learning, and their impact cannot be overstated. The problem is that most of us have had experience with hyperparameters, where we often find ourselves dealing with an excessive number of parameters that need to be tuned. However, in reality, hyperparameters represent the real programmer or someone who takes a consistent approach to defining these parameters.

These approaches often involve using techniques such as cross-validation and grid search to identify the best values for each parameter. By using these techniques, researchers can increase their chances of finding optimal values for their hyperparameters and achieving better results.

By understanding the implications of hyperparameters in deep learning, researchers can develop more effective strategies for systematically exploring hyperparameter spaces. This may involve using techniques such as cross-validation and grid search to identify the best values for each parameter. By adopting a systematic approach and being willing to adapt to changing circumstances, researchers can increase their chances of success when applying deep learning techniques to complex problems.

In conclusion, the implications of hyperparameters in deep learning are significant. Researchers must develop effective strategies for systematically exploring hyperparameter spaces in order to achieve good performance on given problems. By understanding the challenges and opportunities presented by hyperparameters, researchers can develop more effective approaches for identifying optimal values for these parameters.

"WEBVTTKind: captionsLanguage: enbeing effective in developing your deep neural Nets requires that you not only organize your parameters well but also your hyper parameters so what are hyper parameters let's take a look so the parameters your model our W and B and there are other things you need to tell your learning algorithm such as the learning rate alpha because um you need to set alpha and that in turn will determine how your parameters evolve or maybe the number of iterations of gradient descent you carry out your learning algorithm has other you know numbers that you need to set such as the number of hidden layers so we call that capital L or the number of hidden units right such as zero and one and two and so on and then you also have the choice of activation function do you want to use a value or ten age or Sigma little something especially in the hidden layers and so all of these things are things that you need to tell your learning algorithm and so these are parameters that control the ultimate parameters W and B and so we call all of these things below hyper parameters because these things like alpha the learning rate the number of iterations number of hidden layers and so on these are all parameters that control W and B so we call these things hyper parameters because it is the hyper parameters that you know somehow determine the final value of the parameters W and B that you end up with in fact deep learning has a lot of different hyper parameters later in the later course we'll see other hyper parameters as well such as the momentum term the mini batch size various forms of regularization parameters and so on and if none of these terms of the bottom make sense yet don't worry about it we'll talk about them in a second pause because deep learning has so many hyper parameters in contrast to earlier errors of machine learning I'm going to try to be very consistent in calling the learning rate alpha a hyper parameter rather than calling a parameter I think in earlier eras of machine learning when we didn't have so many hyper parameters most of us used to be a bit sloppier and just call alpha a parameter and technically alpha is a parameter but is a parameter that determines the real programmers or try to consistent in calling these things like alpha the number of iterations and so on hyper parameters so when you're training a deep net for your own application you find that there may be a lot of possible settings for the hyper parameters that you need to just try out so apply deep learning today is a very imperiled process where often you might have an idea for example you might have an idea for the best value for the learning rate you might say well maybe alpha equals 0.01 I want to try that then you implemented try it out and then see how that works and then based on that outcome you might say you know what I've changed online I want to increase the learning rate to 0.05 and so if you're not sure what's the best value for the learning ready to use you might try one value of the learning rate alpha and see the cost function J go down like this then you might try a larger value for the learning rate alpha and see the cost function blow up and diverge then you might try another version and see it go down really fast the converse to higher value you might try another version and see it you know see the cost function J do that then after trial so the values you might say okay looks like this the value of alpha gives me a pretty fast learning and allows me to converge to a lower cost function J so I'm going to use this value of alpha you saw on the previous slide that there are a lot of different hyper parameters and it turns out that when you're starting on a new application I should find it very difficult to know exactly what's the best value of the hyper parameters so what often happens is you just have to try out many different values and go around this cycle your try out some value maybe retry five hidden layers with different number of hidden unions implement that Steven works and then iterate so the title of this slide is that apply deep learning is very empirical process and empirical process is maybe a fancy way of saying you just have to try a lot of things and see what works another effect I've seen is that deep learning today is applied to so many problems ranging from computer vision to speech recognition to natural language processing to a lot of structured data applications such as maybe a online advertising or on web search or product recommendations and so on and what I've seen is that first I've seen researchers from one discipline any one of these try to go to a different one and sometimes the intuitions about hyper parameters carries over and sometimes it doesn't so I often advise people especially when starting on a new problem to just try out a range of values and see what works and then next course will see a systematic way we'll see some systematic ways for trying out a range of values right and second even if you're working on one application for a long time you know maybe you're working on online advertising as you make progress on the problem it's quite possible to the best value for the learning rate of number of hidden units and so on might change so even if you tune your system to the best value of hyper parameters today it's possible you find that the best value might change a year from now maybe because of the computer infrastructure it you know CPUs or the type of GPU running on or something has changed but so maybe one rule of thumb is you know every now and then or maybe every few months if you're working on a problem for an extended period of time for many years just try a few values for the hyper parameters and double check if there's a better value for the hyper parameters and as you do so you slowly gain intuition as well about the hyper parameters that work best for your problems and I know that this might seem like an unsatisfying part of deep learning that you just have to try out other values for these hyper answers but maybe this is one area where deep learning research is still advancing and maybe over time we'll be able to give better guidance for the best hyper parameters to use but it's also possible that because CPUs and GPUs and networks and data cells are all changing and it is possible that the guidance won't to converge for some time and you just need to keep trying out different values and evaluate them on a hold on cross-validation set or something and pick the value that works for your problems so that was a brief discussion of hyper parameters in the second course we'll also give some suggestions to how to systematically explore the space of hyper parameters but by now you actually have pretty much all the tools you need to do their programming sir sighs before you do that just share view one more set of ideas which is I often ask what does deep learning have to do the human brainbeing effective in developing your deep neural Nets requires that you not only organize your parameters well but also your hyper parameters so what are hyper parameters let's take a look so the parameters your model our W and B and there are other things you need to tell your learning algorithm such as the learning rate alpha because um you need to set alpha and that in turn will determine how your parameters evolve or maybe the number of iterations of gradient descent you carry out your learning algorithm has other you know numbers that you need to set such as the number of hidden layers so we call that capital L or the number of hidden units right such as zero and one and two and so on and then you also have the choice of activation function do you want to use a value or ten age or Sigma little something especially in the hidden layers and so all of these things are things that you need to tell your learning algorithm and so these are parameters that control the ultimate parameters W and B and so we call all of these things below hyper parameters because these things like alpha the learning rate the number of iterations number of hidden layers and so on these are all parameters that control W and B so we call these things hyper parameters because it is the hyper parameters that you know somehow determine the final value of the parameters W and B that you end up with in fact deep learning has a lot of different hyper parameters later in the later course we'll see other hyper parameters as well such as the momentum term the mini batch size various forms of regularization parameters and so on and if none of these terms of the bottom make sense yet don't worry about it we'll talk about them in a second pause because deep learning has so many hyper parameters in contrast to earlier errors of machine learning I'm going to try to be very consistent in calling the learning rate alpha a hyper parameter rather than calling a parameter I think in earlier eras of machine learning when we didn't have so many hyper parameters most of us used to be a bit sloppier and just call alpha a parameter and technically alpha is a parameter but is a parameter that determines the real programmers or try to consistent in calling these things like alpha the number of iterations and so on hyper parameters so when you're training a deep net for your own application you find that there may be a lot of possible settings for the hyper parameters that you need to just try out so apply deep learning today is a very imperiled process where often you might have an idea for example you might have an idea for the best value for the learning rate you might say well maybe alpha equals 0.01 I want to try that then you implemented try it out and then see how that works and then based on that outcome you might say you know what I've changed online I want to increase the learning rate to 0.05 and so if you're not sure what's the best value for the learning ready to use you might try one value of the learning rate alpha and see the cost function J go down like this then you might try a larger value for the learning rate alpha and see the cost function blow up and diverge then you might try another version and see it go down really fast the converse to higher value you might try another version and see it you know see the cost function J do that then after trial so the values you might say okay looks like this the value of alpha gives me a pretty fast learning and allows me to converge to a lower cost function J so I'm going to use this value of alpha you saw on the previous slide that there are a lot of different hyper parameters and it turns out that when you're starting on a new application I should find it very difficult to know exactly what's the best value of the hyper parameters so what often happens is you just have to try out many different values and go around this cycle your try out some value maybe retry five hidden layers with different number of hidden unions implement that Steven works and then iterate so the title of this slide is that apply deep learning is very empirical process and empirical process is maybe a fancy way of saying you just have to try a lot of things and see what works another effect I've seen is that deep learning today is applied to so many problems ranging from computer vision to speech recognition to natural language processing to a lot of structured data applications such as maybe a online advertising or on web search or product recommendations and so on and what I've seen is that first I've seen researchers from one discipline any one of these try to go to a different one and sometimes the intuitions about hyper parameters carries over and sometimes it doesn't so I often advise people especially when starting on a new problem to just try out a range of values and see what works and then next course will see a systematic way we'll see some systematic ways for trying out a range of values right and second even if you're working on one application for a long time you know maybe you're working on online advertising as you make progress on the problem it's quite possible to the best value for the learning rate of number of hidden units and so on might change so even if you tune your system to the best value of hyper parameters today it's possible you find that the best value might change a year from now maybe because of the computer infrastructure it you know CPUs or the type of GPU running on or something has changed but so maybe one rule of thumb is you know every now and then or maybe every few months if you're working on a problem for an extended period of time for many years just try a few values for the hyper parameters and double check if there's a better value for the hyper parameters and as you do so you slowly gain intuition as well about the hyper parameters that work best for your problems and I know that this might seem like an unsatisfying part of deep learning that you just have to try out other values for these hyper answers but maybe this is one area where deep learning research is still advancing and maybe over time we'll be able to give better guidance for the best hyper parameters to use but it's also possible that because CPUs and GPUs and networks and data cells are all changing and it is possible that the guidance won't to converge for some time and you just need to keep trying out different values and evaluate them on a hold on cross-validation set or something and pick the value that works for your problems so that was a brief discussion of hyper parameters in the second course we'll also give some suggestions to how to systematically explore the space of hyper parameters but by now you actually have pretty much all the tools you need to do their programming sir sighs before you do that just share view one more set of ideas which is I often ask what does deep learning have to do the human brain\n"

Parameters vs Hyperparameters (C1W4L07)

Random Videos