Bias_Variance (C2W1L02)

Understanding Bias and Variance in Machine Learning Algorithms

When it comes to evaluating the performance of machine learning algorithms, there are two key concepts that are often discussed: bias and variance. In this article, we will delve into what these terms mean, how they can be diagnosed, and what the implications are for the performance of a classifier.

Bias refers to the tendency of an algorithm to not fit the training data well. When an algorithm has high bias, it means that it is underfitting the data. In other words, the algorithm is too simple and cannot capture the underlying patterns in the data. This can result in poor performance on both the training set and the test set.

On the other hand, variance refers to the tendency of an algorithm to fit the training data too well. When an algorithm has high variance, it means that it is overfitting the data. In other words, the algorithm is too complex and is fitting the noise in the training data rather than the underlying patterns. This can result in poor performance on the test set but good performance on the training set.

To diagnose bias and variance, we look at the error rates of an algorithm on both the training set and the test set. If an algorithm has a high error rate on the training set but a low error rate on the test set, it is likely to have high bias. This means that the algorithm is not fitting the training data well and therefore is not generalizing well to new data.

On the other hand, if an algorithm has a low error rate on the training set but a high error rate on the test set, it is likely to have high variance. This means that the algorithm is fitting the training data too well and therefore is overfitting and not generalizing well to new data.

There are different ways to diagnose bias and variance. One common approach is to look at the relationship between the error rates of an algorithm on the training set and the test set. If the error rate on the train set is much higher than the error rate on the test set, it may indicate high bias. On the other hand, if the error rate on the train set is similar to or lower than the error rate on the test set, it may indicate low variance.

Another approach is to look at the performance of an algorithm on different subsets of data. For example, if an algorithm performs well on a development set (or dev set) but poorly on a separate testing set, it may indicate high bias. On the other hand, if an algorithm performs equally well on both the training set and the test set, it may indicate low variance.

It's also worth noting that human-level performance is often associated with near-zero error rates. This means that the optimal error rate for a classifier is nearly zero percent. However, this assumes that human-level performance gets nearly zero percent error or more generally get the optimal error sometimes called Bayes error. In reality, the optimal error rate may be higher, say fifteen percent.

In practice, looking at the training set error can give us a sense of how well an algorithm is fitting the data. If the algorithm has high bias, it will perform poorly on the training set. On the other hand, if the algorithm has low variance, it will perform well on the training set but poorly on the test set.

The takeaway from this article is that by looking at your algorithm's performance on the training set and the test set, you can get a sense of whether it suffers from bias or variance problems. This can help you identify areas for improvement in your algorithm.

"WEBVTTKind: captionsLanguage: enI've noticed that almost all the really good machine learning practitioners tend to have a very sophisticated understanding of buyers invariant but in various one of those concepts as easy to learn but difficult to master even if you think you've seen the basic content advisor variants is often more nuanced to attend you'd expect in the deep learning error another trend is that there's an less discussion of what's called the bias-variance tradeoff you might have heard of this thing called the bias-variance tradeoff but in a deep learning arrow that less of a trade-off so it's still tougher biases with overall variance but just talk less about the bias-variance tradeoff let's see what this means let's see you know data set that looks like this if you fit a straight line to the data maybe a logistic regression fit to that this is not a very good fit to the data and so there's a cluster of high bias or we say that this is under fitting the data on the opposite end if you fit an incredibly complex classifier and maybe a big neural network or you network with a while the fit engineers maybe you can fit the data perfectly but that doesn't look like a great big key there so there's a cost v of high variance and this is on overfitting the data and there might be some classifier in the teen with a medium level of complexity that you know maybe exceeds the curve like that that looks like a much more reasonable fit to the data and so that's a call that you know just right right somewhere in between so in a 2d example like this with just two features x1 and x2 you can plot the data and visualize bias and variance in high dimensional problems you can't plot the data and visualize the decision boundary instead there are a couple different metrics that we'll look at to try to understand bias and variance so continuing our example of cat picture complication where that's a positive example and there's a negative example the two key numbers to look at to understand bias and variance will be the training set error and the death set or the developments etc so for the sake of argument let's say that you're recognizing cats and pictures is something that people can do nearly perfectly right and so let's say your training set error is on one percent and your death set error is for the sake of argument let's say is 11 percent so in this example you're doing very well on the training set but you're doing relatively poorly on the development set so this looks like you might have over fit the training set there's some how you're not generalizing well to this hold on cross validation services development set and so if you have an example like this we would say this has high variance so by looking at the training set error and the development set error here you would be able to render a diagnosis of your algorithm having high variance now let's say that you measure your training setting of deadside error and you get a different result let's say that your training set error is 15 percent I'm writing your training so ever the top row and your death set error is 16 percent in this case assuming that humans achieve you know roughly zero percent error that humans can look at these pictures and just tell this cat or not then it looks like the algorithm is not even doing very well on the training set so if it's not even fitting the training data as seen by well then this is under sitting the data and so this algorithm has high bias but in contrast is actually generalizing at a reasonable level to detect entrance performance or death sided only want to send word since the forms in the training set so dude album has a problem of high bias because was not even training it's not even fitting the training set well the dissimilar to the left most plots we had on the previous line now here's another example let's say that you have 15 percent rating set error so that's pretty high bias but when you evaluates on a death set it does even worse maybe it does know 30 percent in this case I would diagnose this algorithm as having high bias because it's not doing that well on the training set and high variance so this is you know really the worst of both worlds oh and one last example if you have you know 0.5 training set error and 1% deaf set error then maybe your users are quite happy that you have a can cause fire but only want to send ever then this will have a low bias and low variance one subtlety that I'll just briefly mention that we'll leave to a later video to discuss in detail is that this analysis is predicated on the assumption that human level performance gets nearly zero percent error or more generally get the optimal error sometimes called Bayes error for that sort of Bayesian optimal error is nearly zero percent I don't want to go into detail on this in this particular video but it turns out that is the optimal error or the Bayes error were much higher say there were fifteen percent then you look at this classifier fifteen percent is actually perfectly reasonable for training set and you wouldn't say that's high bias and won't set pretty low variance so the case of how to analyze bias and variance when no classifier can do very well for example if you have really blurry images so that you know even a human or just no system could possibly do very well then maybe Bayes error is much higher and then details to how this analysis of change but leaving aside this subtlety for now the takeaway is that by looking at your training set error you can get a sense of how well you are fitting at least the training data and so that tells you if you have a bias problem and then looking at how much higher your ever goes when you go from the training set to the DEF set that should give you a sense of how bad is the variance problems are you doing a good job generalizing from the training set to the death set that gives you a sense of your areas all this is under the assumption that the Bayes error is quite small and that your train and your death sets are drawn from the same distribution if those assumptions are violated that the most sophisticated analysis you could do which we'll talk about in the later video now on the previous slide you saw what high bias high variance look like and again she had a sense of what it could cost dialog by what does high bias and high variance look like it's kind of the worst of both worlds so you remember we said that a classifier like this the linear classifier has high bias because under fits the data so this would be a qualifier that is mostly linear and therefore under fit the data we're drawing this in purple but if somehow your classifier does some weird things then is actually overfitting parts of the data as well so the classifier that I drew in purple has both high bias and high variance where there's high bias because by being a mostly linear classifier is just not fitting you know this quadratic right shade that well but by having too much flexibility in the middle it somehow gets this example in this example over since those two examples as well so this cost that kind of has high bias because it was mostly linear between either maybe a curve function a quadratic function and it has high variance because had too much flexibility to fit here those two mislabel all those alive examples in the middle as well in case this seems contrived well it is this example is a little bit contrived in two dimensions but we're getting high dimensional input you actually do get things with high buyers in some regions in high barians in some regions and so it is also to get consoles like this high dimensional inputs that seem less contrived so to summarize you've seen how by looking at your algorithms ever on the training set and your algorithms error on the dev set you can try to diagnose whether has problems high barriers or high variance or maybe both or maybe neither and depending on whether your algorithm suffers from bias or variance it turns out that they're different things you could try so in the next video I want to present you a what I call a basic recipe for machine learning that lets you most automatically try to improve your algorithm depending on whether as high buyers or hide there's issues so let's go on to the next videoI've noticed that almost all the really good machine learning practitioners tend to have a very sophisticated understanding of buyers invariant but in various one of those concepts as easy to learn but difficult to master even if you think you've seen the basic content advisor variants is often more nuanced to attend you'd expect in the deep learning error another trend is that there's an less discussion of what's called the bias-variance tradeoff you might have heard of this thing called the bias-variance tradeoff but in a deep learning arrow that less of a trade-off so it's still tougher biases with overall variance but just talk less about the bias-variance tradeoff let's see what this means let's see you know data set that looks like this if you fit a straight line to the data maybe a logistic regression fit to that this is not a very good fit to the data and so there's a cluster of high bias or we say that this is under fitting the data on the opposite end if you fit an incredibly complex classifier and maybe a big neural network or you network with a while the fit engineers maybe you can fit the data perfectly but that doesn't look like a great big key there so there's a cost v of high variance and this is on overfitting the data and there might be some classifier in the teen with a medium level of complexity that you know maybe exceeds the curve like that that looks like a much more reasonable fit to the data and so that's a call that you know just right right somewhere in between so in a 2d example like this with just two features x1 and x2 you can plot the data and visualize bias and variance in high dimensional problems you can't plot the data and visualize the decision boundary instead there are a couple different metrics that we'll look at to try to understand bias and variance so continuing our example of cat picture complication where that's a positive example and there's a negative example the two key numbers to look at to understand bias and variance will be the training set error and the death set or the developments etc so for the sake of argument let's say that you're recognizing cats and pictures is something that people can do nearly perfectly right and so let's say your training set error is on one percent and your death set error is for the sake of argument let's say is 11 percent so in this example you're doing very well on the training set but you're doing relatively poorly on the development set so this looks like you might have over fit the training set there's some how you're not generalizing well to this hold on cross validation services development set and so if you have an example like this we would say this has high variance so by looking at the training set error and the development set error here you would be able to render a diagnosis of your algorithm having high variance now let's say that you measure your training setting of deadside error and you get a different result let's say that your training set error is 15 percent I'm writing your training so ever the top row and your death set error is 16 percent in this case assuming that humans achieve you know roughly zero percent error that humans can look at these pictures and just tell this cat or not then it looks like the algorithm is not even doing very well on the training set so if it's not even fitting the training data as seen by well then this is under sitting the data and so this algorithm has high bias but in contrast is actually generalizing at a reasonable level to detect entrance performance or death sided only want to send word since the forms in the training set so dude album has a problem of high bias because was not even training it's not even fitting the training set well the dissimilar to the left most plots we had on the previous line now here's another example let's say that you have 15 percent rating set error so that's pretty high bias but when you evaluates on a death set it does even worse maybe it does know 30 percent in this case I would diagnose this algorithm as having high bias because it's not doing that well on the training set and high variance so this is you know really the worst of both worlds oh and one last example if you have you know 0.5 training set error and 1% deaf set error then maybe your users are quite happy that you have a can cause fire but only want to send ever then this will have a low bias and low variance one subtlety that I'll just briefly mention that we'll leave to a later video to discuss in detail is that this analysis is predicated on the assumption that human level performance gets nearly zero percent error or more generally get the optimal error sometimes called Bayes error for that sort of Bayesian optimal error is nearly zero percent I don't want to go into detail on this in this particular video but it turns out that is the optimal error or the Bayes error were much higher say there were fifteen percent then you look at this classifier fifteen percent is actually perfectly reasonable for training set and you wouldn't say that's high bias and won't set pretty low variance so the case of how to analyze bias and variance when no classifier can do very well for example if you have really blurry images so that you know even a human or just no system could possibly do very well then maybe Bayes error is much higher and then details to how this analysis of change but leaving aside this subtlety for now the takeaway is that by looking at your training set error you can get a sense of how well you are fitting at least the training data and so that tells you if you have a bias problem and then looking at how much higher your ever goes when you go from the training set to the DEF set that should give you a sense of how bad is the variance problems are you doing a good job generalizing from the training set to the death set that gives you a sense of your areas all this is under the assumption that the Bayes error is quite small and that your train and your death sets are drawn from the same distribution if those assumptions are violated that the most sophisticated analysis you could do which we'll talk about in the later video now on the previous slide you saw what high bias high variance look like and again she had a sense of what it could cost dialog by what does high bias and high variance look like it's kind of the worst of both worlds so you remember we said that a classifier like this the linear classifier has high bias because under fits the data so this would be a qualifier that is mostly linear and therefore under fit the data we're drawing this in purple but if somehow your classifier does some weird things then is actually overfitting parts of the data as well so the classifier that I drew in purple has both high bias and high variance where there's high bias because by being a mostly linear classifier is just not fitting you know this quadratic right shade that well but by having too much flexibility in the middle it somehow gets this example in this example over since those two examples as well so this cost that kind of has high bias because it was mostly linear between either maybe a curve function a quadratic function and it has high variance because had too much flexibility to fit here those two mislabel all those alive examples in the middle as well in case this seems contrived well it is this example is a little bit contrived in two dimensions but we're getting high dimensional input you actually do get things with high buyers in some regions in high barians in some regions and so it is also to get consoles like this high dimensional inputs that seem less contrived so to summarize you've seen how by looking at your algorithms ever on the training set and your algorithms error on the dev set you can try to diagnose whether has problems high barriers or high variance or maybe both or maybe neither and depending on whether your algorithm suffers from bias or variance it turns out that they're different things you could try so in the next video I want to present you a what I call a basic recipe for machine learning that lets you most automatically try to improve your algorithm depending on whether as high buyers or hide there's issues so let's go on to the next video\n"