Python Tutorial - How do we measure success

Choosing the Right Evaluation Metric for Machine Learning Models

The next step is to decide how we measure if our algorithm works. Choosing how to evaluate your machine learning model is one of the most important decisions an analyst makes. The decision balances the real-world use of the algorithm, the mathematical properties of the evaluation function, and the interpretability of the measure.

Accuracy is a simple measure that tells us what percentage of rows we got right. However, sometimes accuracy doesn't tell the whole story. Consider the case of identifying spam emails. Let's say that only 1% of the emails I receive for spam, while the other 99% are legitimate emails. I can build a classifier that's 99% accurate just by assuming every message is legitimate and never marking a message as spam. But this model isn't useful at all because every message, even the spam, ends up in my inbox.

The metric we use for this problem is called log loss. Log loss is what is generally called a loss function and it is a measure of error. We want our error to be as small as possible, which is the opposite of a metric like accuracy where we want to maximize the value.

How Log Loss is Calculated

Log loss is calculated by taking the actual value 1 or 0 and it takes our prediction, which is a probability between 0 and 1. The Greek letter Sigma (σ) indicates that we're taking the sum of log loss measures for each row in the data set. We then multiply this sum by negative 1 over n, the number of rows, to get a single value for log loss.

To further impact this math, let's look at an example. Consider the case where the true label is 0 but we predict confidently that the label is 1. In this case, because Y is 0, the first term becomes zero. This means that the log loss is calculated by 1 minus Y times log(1 - P). This simplifies to log(1 - 0.9) or log(0.1), which is 2.3.

Now consider the case that the correct label is 1 but our model is not sure and our prediction is right in the middle at a value of 0.5. Our log loss is 0.69 since we are trying to minimize log loss. We can see that it's better to be less confident than it is to be confident and wrong.

Implementation of Log Loss

The most important detail in the implementation of log loss is the clip function, which sets a maximum and a minimum value for the elements in an array. Since log of 0 is negative infinity, we want to offset our predictions ever so slightly from being exactly 1 or exactly 0, so that our score remains a real number.

In this example, we use the eps variable to be zero point zero zero zero 13 zeroes and then a 1, which is close enough to zero not to affect our overall scores after adjusting the predictions slightly with clip. We calculate log loss using the formula if we call this function on the examples we looked at earlier.

We can see that the confident and wrong item returns the expected value of two point three, and the prediction that is right in the middle returns zero point six nine. This demonstrates how to take a mathematical equation and turn it into a function to use for evaluation just like you may need to do if you were participating in a machine learning competition.

Developing Intuition for Log Loss

Let's develop some intuition for how the log loss metric performs with a few examples.

"WEBVTTKind: captionsLanguage: enthe next step is to decide how we measure if our algorithm works choosing how to evaluate your machine learning model is one of the most important decisions an analyst makes the decision balances the real-world use of the algorithm the mathematical properties of the evaluation function and the interpret ability of the measure often we hear the question how accurate is your model accuracy is a simple measure that tells us what percentage of rows we got right however sometimes accuracy doesn't tell the whole story consider the case of identifying spam emails let's say that only 1% of the emails I receive for spam the other 99% are legitimate emails I can build a classifier that's 99% accurate just by assuming every message is legitimate and never marking a message of spam but this model isn't useful at all because every message even the spam ends up in my inbox the metric we use for this problem is called log loss log loss is what is generally called a loss function and it is a measure of error we want our error to be as small as possible which is the opposite of a metric like accuracy where we want to maximize the value let's look at how log loss is calculated it takes the actual value 1 or 0 and it takes our prediction which is a probability between 0 and 1 the Greek letter Sigma which looks like an uppercase e below indicates that we're taking the sum of log loss measures for each row in the data set we then multiply this sum by negative 1 over n the number of rows to get a single value for log Louis we will impact this math a little more by looking at an example consider the case where the true label is 0 but we predict confidently that the label is 1 in this case because Y is 0 the first term becomes zero this means that the log loss is calculated by 1 minus y times log of 1 minus P this simplifies to log 1 minus 0.9 or log 0.1 which is 2.3 now consider the case that the correct label is 1 but our model is not sure and our prediction is right in the middle at a value of 0.5 our log loss is 0.6 9 since we are trying to minimize log loss we can see that it's better to be less confident than it is to be confident and wrong here is an implementation of log loss the most important detail is the clip function which sets a maximum and a minimum value for the elements in an array since log of 0 is negative infinity we want to offset our predictions ever so slightly from being exactly 1 or exactly 0 so that our score remains a real number in this example we use the eps variable to be zero point zero zero zero 13 zeroes and then a 1 which is close enough to zero to not affect our overall scores after adjusting the predictions slightly with clip we calculate log loss using the formula if we call this function on the examples we looked at earlier we can see that the confident and wrong item returns the expected value of two point three and the prediction that is right in the middle returns zero point six nine we have implemented it here to demonstrate how to take a mathematical equation and turn it into a function to use for evaluation just like you may need to do if you were participating in a machine learning competition now let's develop some intuition for how the log loss metric performs with a few examplesthe next step is to decide how we measure if our algorithm works choosing how to evaluate your machine learning model is one of the most important decisions an analyst makes the decision balances the real-world use of the algorithm the mathematical properties of the evaluation function and the interpret ability of the measure often we hear the question how accurate is your model accuracy is a simple measure that tells us what percentage of rows we got right however sometimes accuracy doesn't tell the whole story consider the case of identifying spam emails let's say that only 1% of the emails I receive for spam the other 99% are legitimate emails I can build a classifier that's 99% accurate just by assuming every message is legitimate and never marking a message of spam but this model isn't useful at all because every message even the spam ends up in my inbox the metric we use for this problem is called log loss log loss is what is generally called a loss function and it is a measure of error we want our error to be as small as possible which is the opposite of a metric like accuracy where we want to maximize the value let's look at how log loss is calculated it takes the actual value 1 or 0 and it takes our prediction which is a probability between 0 and 1 the Greek letter Sigma which looks like an uppercase e below indicates that we're taking the sum of log loss measures for each row in the data set we then multiply this sum by negative 1 over n the number of rows to get a single value for log Louis we will impact this math a little more by looking at an example consider the case where the true label is 0 but we predict confidently that the label is 1 in this case because Y is 0 the first term becomes zero this means that the log loss is calculated by 1 minus y times log of 1 minus P this simplifies to log 1 minus 0.9 or log 0.1 which is 2.3 now consider the case that the correct label is 1 but our model is not sure and our prediction is right in the middle at a value of 0.5 our log loss is 0.6 9 since we are trying to minimize log loss we can see that it's better to be less confident than it is to be confident and wrong here is an implementation of log loss the most important detail is the clip function which sets a maximum and a minimum value for the elements in an array since log of 0 is negative infinity we want to offset our predictions ever so slightly from being exactly 1 or exactly 0 so that our score remains a real number in this example we use the eps variable to be zero point zero zero zero 13 zeroes and then a 1 which is close enough to zero to not affect our overall scores after adjusting the predictions slightly with clip we calculate log loss using the formula if we call this function on the examples we looked at earlier we can see that the confident and wrong item returns the expected value of two point three and the prediction that is right in the middle returns zero point six nine we have implemented it here to demonstrate how to take a mathematical equation and turn it into a function to use for evaluation just like you may need to do if you were participating in a machine learning competition now let's develop some intuition for how the log loss metric performs with a few examples\n"