#34 Machine Learning Specialization [Course 1, Week 3, Lesson 2]

The f is always between 0 and 1 because the output of logistic regression is always between 0 and 1. The only part of the function that's relevant is therefore this part over here corresponding to F between 0 and 1. So let's zoom in and take a closer look at this part of the graph if the algorithm predicts a probability close to one and the true label is one then the loss is very small it's pretty much zero because you're very close to the right answer.

Now, continue with the example of the true label y being one so say it really is a malignant tumor. If the algorithm predicts 0.5 then the loss is at this point here which is a bit higher but not that high whereas in contrast if the algorithm were to have outputs 0.1 if it thinks that there's only a 10 chance of the tumor being malignant but y really is one it really is malignant then the loss is this much higher value over here. So when Y is equal to 1 the loss function incentivizes or nudges or it helps push the algorithm to make more accurate predictions because the loss is lowest when it predicts values close to 1.

On this slide, we'll be looking at what the loss is when Y is equal to 1. On this slide let's look at the second part of the loss function corresponding to when Y is equal to zero in this case the loss is negative log of 1 minus f of x when this function is plotted it actually looks like this. The range of f is limited to 0 to 1 because logistic regression only outputs values between 0 and 1. And if we zoom in this is what it looks like so in this plot corresponding to y equals zero the vertical axis shows the value of the loss for different values of f of x so when f is 0 or very close to zero the loss is also going to be very small which means that if the true label is 0 and the model's prediction is very close to zero well you nearly got it right so the loss is appropriately very close to zero.

The larger the value of f of x gets the bigger the loss because the prediction is further from the true label zero and in fact as that prediction approaches one the loss actually approaches Infinity going back to the tumor prediction example. This is if a model predicts that the patient's tumor is almost certain to be malignant say 99.9 chance of malignancy but it turns out to actually not be malignant so y equals zero then we penalize the model with a very high loss. So in this case of y equals zero similar to the case of y equals one on the previous slide the further the prediction f of x is away from the true value of y the higher the loss and in fact if f of x approaches zero the loss here actually goes really really large and in fact approaches Infinity.

So when the true label is one the album is strongly incentivized not to predict something too close to zero. So we've seen a lot in this video in the next video let's go back and take the loss function for a single chain example and use that to define the overall cost function for the entire training set and we'll also figure out a simpler way to write out the cost function which will then later allow us to run gradient descent to find good parameters for logistic regression.