Tuning a Machine Learning System: A Holistic Approach
To adjust that you don't want to have to carefully adjust five different knobs, we can also use an alternative approach. Instead of having one knob affect multiple aspects, we can have separate knobs for each specific aspect we want to tune. For example, if your algorithm is not fitting the training set well on the cost function, you want to know that one knob is just affecting the width of your TV image, so in a similar way, if your algorithm is not fitting the training set well, we want one knob to just affect the depth setting of your system. So, there's a specific set of knobs that you can use to make sure you can tune your algorithm to fit well on the training set.
The knobs we're talking about are like knobs on a TV remote control, where you can adjust different settings to get the desired outcome. If your algorithm is not setting well on the test set but doing well on the death set, then you want a separate set of knobs that will help you address this issue. These knobs are like additional controls on your TV system that allow you to fine-tune specific aspects of your machine learning system.
For example, if your algorithm is not doing well on the test set but doing well on the training set and death sets, then you want a different knob in order to tune the depth setting of your system. And, you want to do this hopefully without affecting other settings too much. Getting a bigger training set would be another knob that helps your learning algorithm generalize better to the death set.
Having adjusted the width, height, and depth settings of your TV system well, what if it doesn't meet the third criteria? What if you do well on the death set but not on the test set? If that happens, then the knob you tuned is probably too big. You want to go back and find a bigger depth setting because if it does well on the death set but not on the test set, it probably means you've over-tuned your system.
Finally, if it blows on the test set but isn't delivering for your specific use case, like an image recognition app that uses machine learning, then what that means is that you want to go back and change either the DEF (data, evaluation function) set or the cost function. If doing well on the test set according to some cost function doesn't correspond to your algorithm doing what you needed it to do in the real world, then it means that either your DEF test sets distribution isn't set correctly or your cost function isn't measuring the right thing.
Training a neural network involves using different knobs and techniques. I tend not to use early stopping, which is not a bad technique, but I find it difficult to think about because this is a knob that simultaneously affects how well you fit the training set and also often used to improve your model's performance on the test set. This means it's like trying to tune two things at once.
There are knobs that affect both the width and the height of your TV image, which don't mean they're bad to use; you can use them as you want. However, we have more orthogonal controls such as regularization, dropout, batch normalization, etc., that make it easier to tune your system without affecting other aspects.
Just like adjusting the settings on a TV remote control, in machine learning, we want to look at our system and say "oh this piece of it is wrong" - not doing well on the training set, death set, or test set. We then want to identify the specific knobs that are limiting its performance and tune them precisely.
In order to diagnose what's the bottleneck to your system's performance, we need to go through a detailed process. We will go into how to identify exactly what's wrong with your system and which knobs you can use to solve that specific problem. That means going over the details of tuning each aspect of your machine learning system separately.
The goal is to have a clear understanding of the organization of these knobs and techniques, so we know which knob to turn when we want to address a specific issue. This approach allows us to fine-tune our systems precisely without affecting other aspects, resulting in better performance and more accurate predictions.
"WEBVTTKind: captionsLanguage: enone of the challenges with building machine learning systems is that there are so many things you could try some things we could change including for example so many high parameters you could tune one of the things I've noticed about the most effective machine learning people is the very clear-eyed about what to tune in order to try to achieve one effect this is a process we call orthogonalization let me tell you what I mean here's a picture of an old school television with a lot of knobs that you can tune to adjust the picture in various ways so for these Oh TV sets maybe there was one knob to adjust how tall vertically your images and then a knob to adjust how wide it is maybe another not to adjust how many Oh trapezoidal it is another knob to adjust how much the roof the picture our left and right another one to adjust how much the pictures rotated and so on and what TV designers have spent a lot of time doing was to build the circuitry really often analog circuitry back then to make sure that each of the knobs has a relatively interpretable function such as one not to tune this one not to do this one not to do this and so on in contrast imagine if it has a knob that tune 0.1 times how tall the image is plus 0.3 times how why the image is minus 1.7 times how trapezoidal the image is plus 0.8 times the position of the image on the horizontal axis and so on if you tune this knob then the height of the image the width of the image how it should please order it is how much it shifts it all changes all at the same time if you have a knob like that it'd be almost impossible to tune the TV so that the picture gets centered in the display area so in this context orthogonalization refers to that the TV designers had designed the knobs so the each knob kind of does only one thing and this makes it much easier to tune the TV so that you know the picture get centered where you want it to be here's another example of our organization if you think about learning to drive a car a car has three main controls which are steering steering wheel decides how much you go left or right acceleration and braking so these three controls are really one control for steering and another you know two controls for your speed it makes it relatively interpretable what your different actions two different controls will do to your call but now imagine if someone were to build a car so that there was a joystick we're one axis of the joystick control 0.3 times your steering angle minus 0.8 times your speed and you have a different control that controls two times the steering angle plus 0.9 times the speedier call in theory by tuning these two knobs you could get your car to steer the angle and at the speed you want but it's much harder then if you had just one single control for controlling a steering angle and you know a separate distinct set of controls for controlling your speed so the concept of a whole organization refers to that if you think of one dimension of what you want to do as controlling a steering angle and another dimension as controlling your speed then you want one knob to just affect the steering angle as much as possible and another knob in the case of the car is really a celebration braking that controls your speed but if you have a control that mixes the two together that could control like this one then affects both is doing angle and your speed something that changes both at the same time then it becomes much harder to set the car to the speed and angle you want and by having orthogonal orthogonal means at 90 degrees to each other but having orthogonal controls that are ideally aligned with the things you actually want to control it makes it much easier to tune the naughty happed attune to tune the string real angle and your cell you're breaking to get the card to do what you want so how does this relate to machine learning for a supervised learning system to do well you usually need to tune the naltar the system to make sure that four things are true first is that you should have to make sure that you're at least doing well on the training set so performance on the training set needs to pass some acceptability affection for some applications this might mean doing comparably to human level performance but this will depend on your application and we'll talk more about comparing to human level performance next week but after doing well on the training set you then hope that just leads to also doing well on the depth set and you then hope that this also does well on the test set and finally you hope that doing well on the test set on the cost function results in your system performing in the real world so you hope that this results in you know happy cat picture app users for example so to relate back to the TV tuning example if the picture of your TV was either too wide or too narrow you wanted one knob - - in order to adjust that you don't want to have to carefully adjust five different knobs we can also also affect different things you want one knob to just affect the width of your TV image so in a similar way if your algorithm is not fitting the training set well on the cost function you want you know one knob yes that's my attempt to draw a knob or maybe one specific set of knobs that you can use to make sure you can tune your algorithm to make it fit well on the training set so the nos you used to tune this are you might train a bigger network or you might switch to a better optimization algorithm like the atom optimization algorithm and so on and there's some other options we'll discuss later this weekend next week in contrast if you find that the algorithm is not setting the DEF set well then there's a separate set of not my not very artistic rendering of another knot you don't want to have a distinct set of not to try so for example if your algorithm is not doing well and death set is doing one training set but not on a death set then you have a set of knots around regularization that you can use to try to mean to satisfy the second criteria so the analogy is now now you tune the width of the TV set if the height of the image isn't quite right then you want a different knob in order to tune the height of the TV NH and you want to do this hopefully without affecting the width of your TV image too much and getting a bigger training set would be another knob you can use um that helps your learning algorithm generalize better to the death set now having adjusted you know the width and height of your TV emulation well what if it doesn't meet the third criteria what if you do well on the death set but not on the test set if that happens then the knob you tuned is you probably want to get a bigger death set because um if it does well on the death side but not on the test set it probably means you've over Tunes your depth setting is go back and find a bigger depth set and finally if it does blow on the test set but it isn't delivering to you a happy cat picture app user then what that means is that you want to go back and change either the DEF set or the cost function because if doing well on the test set according to some cost function doesn't correspond to your algorithm doing what you needed to do in the real world then it means that either your def test sets distribution isn't set correctly or your cost function isn't measuring the right thing I know I'm going over these examples quite quickly but will go much more into detail on these specific knobs later on this week and next week so we aren't following all the details right now don't worry about it but I want to give you a sense of this organization process that you want to be very clear about which these may be for issues the different things you could tune are trying to address and when I train a neural network I tend not to use early stocking it's not a bad technique quite a lot of people do it but I personally find early stopping difficult to think about because this is a knob that simultaneously affects how well you fit the training set because if you stop early you fit the training set as well but it also simultaneously is often done to improve your def set performance so this is one knob that is you know less orthogonalize because it's simultaneously affects two things there's like a knob that silent E and C affects both the width and the height of your TV image and it doesn't mean it's a bad not to use you can use it as you want but we have more orthogonalize controls such as these other ones I'm writing down here then it just makes the process of tuning your network much easier so I hope that gives you a sense of what our organization means just like we look at the TV image is nice you can say oh my TV image is too wide so I'm going to change this knob or it's too tall don't turn that knob or is too trapezoidal so I'm gonna have to that now in machine learning it's nice if you can look at your system and say oh this piece of it is wrong it is not doing on a training set not doing on a dead set not doing on the test set or I'm still on the test set which is not in real world but figure out exactly what's wrong and have exactly one knob where specifics that are not the helps you just solve that problem that is limiting the performance with a machine learning system so what I'm going to do this weekend next week is go through how to diagnose what exactly is the bottleneck to your systems performance as well as identified the specifics of the knobs you could use to tune your system to improve that aspect those performance so let's start going more into the details of this processone of the challenges with building machine learning systems is that there are so many things you could try some things we could change including for example so many high parameters you could tune one of the things I've noticed about the most effective machine learning people is the very clear-eyed about what to tune in order to try to achieve one effect this is a process we call orthogonalization let me tell you what I mean here's a picture of an old school television with a lot of knobs that you can tune to adjust the picture in various ways so for these Oh TV sets maybe there was one knob to adjust how tall vertically your images and then a knob to adjust how wide it is maybe another not to adjust how many Oh trapezoidal it is another knob to adjust how much the roof the picture our left and right another one to adjust how much the pictures rotated and so on and what TV designers have spent a lot of time doing was to build the circuitry really often analog circuitry back then to make sure that each of the knobs has a relatively interpretable function such as one not to tune this one not to do this one not to do this and so on in contrast imagine if it has a knob that tune 0.1 times how tall the image is plus 0.3 times how why the image is minus 1.7 times how trapezoidal the image is plus 0.8 times the position of the image on the horizontal axis and so on if you tune this knob then the height of the image the width of the image how it should please order it is how much it shifts it all changes all at the same time if you have a knob like that it'd be almost impossible to tune the TV so that the picture gets centered in the display area so in this context orthogonalization refers to that the TV designers had designed the knobs so the each knob kind of does only one thing and this makes it much easier to tune the TV so that you know the picture get centered where you want it to be here's another example of our organization if you think about learning to drive a car a car has three main controls which are steering steering wheel decides how much you go left or right acceleration and braking so these three controls are really one control for steering and another you know two controls for your speed it makes it relatively interpretable what your different actions two different controls will do to your call but now imagine if someone were to build a car so that there was a joystick we're one axis of the joystick control 0.3 times your steering angle minus 0.8 times your speed and you have a different control that controls two times the steering angle plus 0.9 times the speedier call in theory by tuning these two knobs you could get your car to steer the angle and at the speed you want but it's much harder then if you had just one single control for controlling a steering angle and you know a separate distinct set of controls for controlling your speed so the concept of a whole organization refers to that if you think of one dimension of what you want to do as controlling a steering angle and another dimension as controlling your speed then you want one knob to just affect the steering angle as much as possible and another knob in the case of the car is really a celebration braking that controls your speed but if you have a control that mixes the two together that could control like this one then affects both is doing angle and your speed something that changes both at the same time then it becomes much harder to set the car to the speed and angle you want and by having orthogonal orthogonal means at 90 degrees to each other but having orthogonal controls that are ideally aligned with the things you actually want to control it makes it much easier to tune the naughty happed attune to tune the string real angle and your cell you're breaking to get the card to do what you want so how does this relate to machine learning for a supervised learning system to do well you usually need to tune the naltar the system to make sure that four things are true first is that you should have to make sure that you're at least doing well on the training set so performance on the training set needs to pass some acceptability affection for some applications this might mean doing comparably to human level performance but this will depend on your application and we'll talk more about comparing to human level performance next week but after doing well on the training set you then hope that just leads to also doing well on the depth set and you then hope that this also does well on the test set and finally you hope that doing well on the test set on the cost function results in your system performing in the real world so you hope that this results in you know happy cat picture app users for example so to relate back to the TV tuning example if the picture of your TV was either too wide or too narrow you wanted one knob - - in order to adjust that you don't want to have to carefully adjust five different knobs we can also also affect different things you want one knob to just affect the width of your TV image so in a similar way if your algorithm is not fitting the training set well on the cost function you want you know one knob yes that's my attempt to draw a knob or maybe one specific set of knobs that you can use to make sure you can tune your algorithm to make it fit well on the training set so the nos you used to tune this are you might train a bigger network or you might switch to a better optimization algorithm like the atom optimization algorithm and so on and there's some other options we'll discuss later this weekend next week in contrast if you find that the algorithm is not setting the DEF set well then there's a separate set of not my not very artistic rendering of another knot you don't want to have a distinct set of not to try so for example if your algorithm is not doing well and death set is doing one training set but not on a death set then you have a set of knots around regularization that you can use to try to mean to satisfy the second criteria so the analogy is now now you tune the width of the TV set if the height of the image isn't quite right then you want a different knob in order to tune the height of the TV NH and you want to do this hopefully without affecting the width of your TV image too much and getting a bigger training set would be another knob you can use um that helps your learning algorithm generalize better to the death set now having adjusted you know the width and height of your TV emulation well what if it doesn't meet the third criteria what if you do well on the death set but not on the test set if that happens then the knob you tuned is you probably want to get a bigger death set because um if it does well on the death side but not on the test set it probably means you've over Tunes your depth setting is go back and find a bigger depth set and finally if it does blow on the test set but it isn't delivering to you a happy cat picture app user then what that means is that you want to go back and change either the DEF set or the cost function because if doing well on the test set according to some cost function doesn't correspond to your algorithm doing what you needed to do in the real world then it means that either your def test sets distribution isn't set correctly or your cost function isn't measuring the right thing I know I'm going over these examples quite quickly but will go much more into detail on these specific knobs later on this week and next week so we aren't following all the details right now don't worry about it but I want to give you a sense of this organization process that you want to be very clear about which these may be for issues the different things you could tune are trying to address and when I train a neural network I tend not to use early stocking it's not a bad technique quite a lot of people do it but I personally find early stopping difficult to think about because this is a knob that simultaneously affects how well you fit the training set because if you stop early you fit the training set as well but it also simultaneously is often done to improve your def set performance so this is one knob that is you know less orthogonalize because it's simultaneously affects two things there's like a knob that silent E and C affects both the width and the height of your TV image and it doesn't mean it's a bad not to use you can use it as you want but we have more orthogonalize controls such as these other ones I'm writing down here then it just makes the process of tuning your network much easier so I hope that gives you a sense of what our organization means just like we look at the TV image is nice you can say oh my TV image is too wide so I'm going to change this knob or it's too tall don't turn that knob or is too trapezoidal so I'm gonna have to that now in machine learning it's nice if you can look at your system and say oh this piece of it is wrong it is not doing on a training set not doing on a dead set not doing on the test set or I'm still on the test set which is not in real world but figure out exactly what's wrong and have exactly one knob where specifics that are not the helps you just solve that problem that is limiting the performance with a machine learning system so what I'm going to do this weekend next week is go through how to diagnose what exactly is the bottleneck to your systems performance as well as identified the specifics of the knobs you could use to tune your system to improve that aspect those performance so let's start going more into the details of this process\n"