Hot Dog or Not Hot Dog – Convolutional Neural Network Course for Beginners

The Art of Convolutional Neural Networks: A Hands-on Introduction

As we delve into the world of machine learning and neural networks, it's essential to understand the basics of how these complex models work. In this article, we'll explore the fundamentals of convolutional neural networks (CNNs) using a unique example: classifying images of hot dogs versus not hot dogs.

Running the Model

----------------

To get started, we need to run the model. The author suggests compiling the model and then training it on the provided dataset. As the training process begins, we wait patiently for the results. After what feels like an eternity, the training is complete, and we can finally take a look at the results.

Training Loss and Accuracy

---------------------------

The training loss decreases from 1.6 to around 0.57, while the accuracy increases from 50% to 70%. These numbers are impressive, but it's essential to remember that this is the training set, not the validation set. The author notes that the best accuracy achieved was around 75%, and they encourage viewers to experiment with hyperparameters, model layout, different activations, regularizers, and filter sizes to improve performance.

Using Validation Data

----------------------

Although the dataset does not come with a test set, we can use the validation data to demonstrate the effectiveness of our model. The author suggests using image batches from the validation data set, along with their corresponding labels. By feeding these batches into the model, we can assess its accuracy and see how well it generalizes to unseen data.

Data Augmentation and Image Classification

-----------------------------------------

The author notes that data augmentation is a crucial step in preparing our dataset for training. However, instead of using data augmentation, they decide to use image batches from the validation data set to demonstrate the model's capabilities. This approach allows us to see how the model would perform on actual images without augmenting them.

Visualizing the Results

-------------------------

To better understand the results, the author decides to visualize the first nine images in each batch using a plotting function. The resulting plots show the original images, along with their corresponding labels. Upon reviewing the labels, we can see that the model was able to classify all nine images accurately: six as hot dogs and three as not hot dogs.

Conclusion

----------

In conclusion, this article provides an introductory look at convolutional neural networks using a unique example of image classification. By following the author's instructions, readers can experiment with CNNs and improve their skills in machine learning. The use of validation data to assess accuracy and the visualization of results provide valuable insights into how these complex models work.

Additional Resources

---------------------

The author encourages viewers to subscribe to their channel for more content on artificial intelligence and machine learning. Additionally, they announce that a course on artificial intelligence will be released later this year, providing further opportunities for readers to learn from experts in the field.

"WEBVTTKind: captionsLanguage: enIs this a hot dog? Or not a hot dog? What about this one?In this course, you will learn about convolutional neural networks.These are a class of deep learning neural networks that are particularly effective forclassifying images. CNNs are also used for other applications such as natural languageprocessing and time series forecasting, but they're most commonly associated with image processing.Kylie Ying teaches this course. Kylie is a software engineer and she is passionateabout machine learning and artificial intelligence. So let's start learningafter you determine if this is a hot dog. Welcome to this introductory course on convolutionalneural networks. In this course, I'm going to be talking about how our computer can look at animage with a dog in it and say, Hey, this is a dog. So the secret behind that is exactly thisconvolutional neural networks. And that's what we're going to be discussing today.From me looking at this, clearly this is a hot dog. I don't think that's a hot dog. Can'treally tell what that is. hummus, hot dog, garlic bread, hot dog, hot dog, oysters, and it looks likesausage and waffle. And now if we look at our labels, one means that our model thinks it's a hotdog. Zero means that it doesn't think it's a hot dog. So for the first row, we get yes, no, no. Sowe get hot dog, not a hot dog, not a hot dog. And then we get yes, no, yes. So hot dog, not a hotdog, hot dog. And then yes, no, no. So hot dog, not a hot dog, not a hot dog. And I think that'spretty awesome for a model that we are going to learn in this video today. It's ready. Please,God. What would you say if I told you there is an app on the market? We're past that part.Just demo it. Okay. Let's start with a hot dog.Oh, shit. Yes.How's that? My beautiful little Asiatic friend. I'm going to buy you the palapa of your life.We will have 12 posts, braided palm leaves. You'll never feel exposed again.I'm going to be rich. Do pizza. Yes, do pizza.Pizza. Not hot dog. Wait, what? That's that's it. It only does hot dogs. No, and not hot dog.If you guys have already seen a few of my previous free code camp machine learning videos,feel free to skip ahead to the section that actually discusses convolutional neural networksand then the co lab where we will be practicing how to actually implement this on a very excitingproject. Okay, for those of you who are new here, stay tuned. We're going to cover the basics ofmachine learning and we will build up to understanding how a convolutional neural networkworks. Alright, so let's get started. This is the introduction to convolutional neural networkspresented by me, Kylie Yang on behalf of free code camp. Make sure you guys go and check out mychannel Kylie Yang for beginner programming content, as well as future courses on artificialintelligence. So with that being said, let's dive right in. What exactly is machine learning?Well, machine learning is a subdomain of computer science that focuses on algorithms, which help acomputer learn from data without us programmers explicitly programming certain instructions. Sobasically, we want to be able to train our computer to understand or to comprehend or to draw somesort of conclusion to learn from certain data that we feed it. So there's a few different types ofmachine learning. The first type is known as supervised learning. In supervised learning,we use labeled inputs, which means that each input has a corresponding output label to trainmodels and to learn outputs. So let's look at some examples. Basically, here, we have a fewpictures, right? We have a picture of a cat, we have a picture of a dog, and we have a pictureof this looks like a gecko. Now, our computer doesn't actually know these labels ahead of time,so we as humans have labeled these items. And we are familiar with pictures of cats, dogs andgeckos. So we can label these in our head, but our computer only sees pixels. And so what weare doing in supervised learning is we are actually constructing a data set with theselabels attached to them. So when we feed these pictures to our computer, we're saying, hey,this top left one is a cat. This right one is a dog. And this bottom one, okay, I labeled it asa lizard, but you get the point. Basically, we're feeding it a label when we pass it into thecomputer. Now, there's also unsupervised learning. And basically, this uses unlabeled data to learnabout patterns in data. So here, if we have these images, well, you know, if we have multipledifferent images of cats, multiple different ones of lizards, geckos, these things, and thenmultiple pictures of dogs, then our goal is our computer would be able to learn from all of theseand be able to pull out similar features and say, hey, you know, these all seem like they're onetype of category, this seems like another category, and this seems like it'd be another category.So that's unsupervised learning where we don't necessarily provide the labels. And our computeris trying to draw conclusions from similarities that it finds in our data set. Now, the last typeof machine learning is known as reinforcement learning. So in reinforcement learning, there'san agent that's learning in an interactive environment. And it learns based on rewardsor penalties that it observes while it does some sort of action throughout this environment.So, for example, training your dog is a type of reinforcement learning. And we're essentiallyreplacing this dog with a computer. So when we train our dogs, you know, if our dog sits, thenwe feed it a treat. And if our dog barks, we might yell at it and get angry and the dog will eventuallyhopefully stop barking. Sometimes my dog is not like that. But anyways, based on these rewardsand penalties, the dog basically picks up intuition about what actions will be able to get futurerewards and what actions will lead to future penalties. So that's reinforcement learning,we train our computer in a very similar way. But our reward for our computer might just besomething like points. Today, we are focusing on supervised learning. So let's talk about supervisedlearning a bit more. Basically, in machine learning, we have some sort of inputs. So here we haveinputs one through n. So these are all of our samples, all of our examples, and they go intosome model, which we'll talk about in a second. And then it leads to some sort of output, somesort of prediction. Now, the terminology here is that all of our inputs, these are known as ourfeature vector. So each input that we give our model should be in the form of some sort ofparsable data, which often means just a vector of numbers. Now, these features can be qualitative.So that typically means categorical data, there are finite numbers of categories, or groups. Soone example of that might be the traditional way of defining gender. So female or male,this is qualitative, because there's only a certain number of groups. Another example might be, okay,what country do you live in? Well, there are only a finite number of countries out there. And so thisis qualitative data, this is categorical, there's a finite number of categories. So this is knownas nominal data, because there's no inherent order, it's not like a happiness rating, where,you know, zero is unhappy, and five is very happy, there's no ranking between either of these. Now,the other type of qualitative data might be something like age, right? So these are differentcategories of being, you know, an infant, and then a child, and then a teenager, an adult, etc. Andas I just said, there might also be happiness ranking. So you know, 12345, five being reallyhappy, one being not that happy. And this is known as ordinal data, because there's an inherentordering to this data set. But basically, both of these are known as qualitative data sets,because they are like categorical, there's only a certain number of categories there. And now youmight be wondering, okay, doesn't that encompass all of our data? No. So the other type of datais quantitative data. So that means it might be numerical value data. And it could be discrete orcontinuous. So some examples of that are okay, how long is something? You get a number, right? Like,my desk could be 5.289 feet long. Sorry, if you don't use the American measurement system,which is the rest of the world, but you get my point. It's an infinite length, it's a numericalsystem. Now, it can also be temperature. So for example, this might be, I don't know, 200 degreesFahrenheit. Again, sorry, if you do not use Fahrenheit, which happens to be everybody,not in America. Or it could even be a discrete numerical value. So for example, if we're on anEaster egg hunt, well, it looks like we have something around like maybe 10 or so eggs inour basket. So that number might be 10, but that could be zero all the way to infinity, right? Thebasic point of this is that it's quantitative. So it's numerical data. And these two values arecontinuous, because you could have like, pi fee, right? But you can't really have pi eggs. So that'swhy this is discrete. It means that it only follows like 123, like counting numbers, whereas continuousdoesn't. Okay, so those are our features. And so now let's talk about the types of predictions,the outputs of our model. So the different types of outputs that we can have the first type oftask is called classification. And this means we're going to go and predict discrete classes.So for example, if we have a picture of a hot dog, a pizza and an ice cream cone, classificationwould say, okay, this is a hot dog, this is a pizza, and this is an ice cream cone, it give usthree distinct categories and try to map something into one of those categories. This is known asmulti class classification, because we have more than two types of classes. Now, if I were toclassify these into hot dog, and then not hot dog, then that becomes binary classification,because there's only two options. So it's one or the other multi class is more than two. So I couldhave like 10 different types of food and try to classify into these 10 different types.Other examples of classification. So binary classification, if you have like positive ornegative sentiment in a paragraph, or a picture might be a cat or a dog, or, you know, an emailmight be classified as spam or not spam. For multi class, you might have cat, dog, lizard,dolphin, etc, all the animals in the animal kingdom, you might have orange apple pear,or you might have all the different species of plants in the world. Now, the second type ofsupervised learning is known as regression. So in this case, we're trying to predict a continuousvalue. So here, we might be trying to predict the price of an asset, such as I think I took thisa screenshot from like Ethereum or something. But we might be trying to predict the price ofEthereum, or we might be trying to predict how much snowfall we're going to have on some certainday. Or we might be trying to predict the housing market, how much will this house cost, you know,in two months, or two years, or 20 years. So these are all regression tasks, because we're trying topredict continuous values. Now let's dive a little bit into the model. Before we talk about thedifferent types of models, let's kind of discuss how do we actually make this model learn? How canwe tell whether or not it's actually learning? Let's talk about that. So let's take this dataset. For example, this data set is a data set that I found online, it is a Pima Indian diabetesdata set. And this was originally provided by the National Institute of Diabetes and digestive andkidney diseases. Let's talk a little bit about what we're actually looking at. So here we havethe labels for the different columns, right? So number of pregnancies, glucose levels, blood,blood pressure, skin thickness, insulin, BMI, age, and then outcome, whether or not they havediabetes. Each row here represents a different sample in the data set. So each individual thatthis data was collected from, that's what each row represents, right? So this individual herehad one pregnancy, and these were her glucose values, blood pressure, skin thickness, insulin,BMI, age, and then the outcome, whether or not this person has diabetes. And this row down heremight tell a different story, it might be a different person. Well, it is a different person.Now each column, well, this is a different feature. So this specific feature is the blood sugar,or sorry, blood pressure feature. So this measures all the blood pressure levels amongst our entiredata set. Except for this one over here, this is our outcome. So this is our output label. And here,specifically, our output label is ones and zeros, because we need to transform yes and no into alanguage that our computer can understand. Our computer is very, very good with numbers. So inthis specific example, we're coming up with zero being negative, no diabetes, and then one beingpositive, which stands for they have diabetes. And this is a very, very common way of actuallyencoding yes or no for our labels, or actually for even our features as well. But anyways, this isour output label. Everything, all of our features minus our output label, this is what we would calla feature vector. Now this is what gets passed into our model. And then this over here, this is whatwe call the target for the feature vector. So essentially, if I pass my model these values,then I would want it to get as close to the target or the actual value, because remember,we're doing supervised learning, we would want these values to get as close to this target aspossible. And same with any of these other samples, if I pass this value into our model,I would hope that it predicts zero, if I pass in this row, this feature vector in,I would want it to predict one. And when we put all of these feature vectors together,we call this the features matrix x. And this is only really important when, you know, you might bestudying a bit of linear algebra, if you go on into more in depth machine learning. And this overhere, this is our labels, or our targets vector, also known as y. So again, this is just somethinga little bit of terminology. So let's actually visualize this as a chocolate bar. So here wehave this x matrix, which has all of our feature vectors. So imagine like, a row of chocolate is afeature vector. And these are all of our targets, the corresponding targets for each of those featurevectors. Alright, if I take a feature vector from my data set, and I pass it into this model,well, my model is going to output some sort of prediction. Right now, how do we use our actualoutput label in order to help us, you know, determine a better model. So what I can actuallydo is I can take the output, the actual value, the actual outcome, because we have that information,since this is supervised learning, and I'm going to compare my prediction with the actual outputgiven to us in our original data set. Now, what I can do is I can take, you know, some sort ofdifference between these two say, okay, how far am I from this desired output, I'm going to usethat data, and I'm going to train the model using that. Now, if we're inputting many differentvectors into our model, getting the outputs of those taking the difference and training our model,then our model over time, as we train it, we'll get closer and closer to predicting, you know,the actual output down here. So this is our supervised learning data set, it's this chocolatebar with over here, our features vector, and then our output labels. What we normally do is weactually split it up into three different types of data sets. So we have this like training data set,which will be most of our data, we use our data in order to train our model. But now how do weactually determine how good does our model predict on stuff it hasn't seen yet? Becausewhat we would ideally want to do is take it out into the real world someday, right and say,hey, here's a picture, what's in this. So in order to do that, we also need a testing data set. Sothat testing data set is just some data that we've removed from our original data set. And we'rejust going to use that in order to see how well our model can do on data that it hasn't seen yet.So it's part of our original data, but we're removing it, holding it off to the side so thatwhen we finish our model, we can say, hey, now look at this new data. How well how well can youperform on this? Okay, so what is exactly is this validation data set here in the middle? Okay, sowhen you're actually building the model, suppose that you have a bunch of different models thatyou've come up with, how do you actually choose which one is the best? That's where the validationdata set comes in. So imagine you are out buying a car, right, there's many different types of cars,some of them are just slightly better than the others, some of them just have a few things hereand there change, just like small bells and whistles, right. So this is a test drive whereyou're going, you're testing out all the models, and you're picking out, okay, this one is the bestand this is the one that I want to keep. Now, what's the difference between validation and testing?Well, validation is used on all the models, so you can pick out the best one, and then testing,you're using that model, and you're saying, okay, well, how well does this model do on data that Ihaven't seen yet? And these two, we both keep aside. So like, they're not used for training,but they have two different purposes. So remember that validation is test driving, picking out thebest model, testing data set is used on that best model in order to get your final accuracy numberor final metric about how well your model does. Okay, so let's visualize each of those. Thistraining data set is passing the model, and the model produces some sort of prediction for eachof the feature vectors in this training data set. So this training data makes some sort of prediction,and we're actually comparing it to the actual output. And when we take the difference,comparing the two outputs, that is what's known as the loss. Now, this loss can be used to actuallymake adjustments, which is known as training to this model. Now, this validation set, again,is used as a reality check during and after training to, first of all, ensure that themodel can handle unseen data. And then also to say, okay, which model do we want to use? So again,this is a test drive. Here we have model A has a loss of 1.3. Model B has a loss of 1.5. Model Chas a loss of 0.5. And Model D has a loss of 0.9. So which one is the best model? It's Model C.Now, finally, once we've selected Model C, we take our test set, we pass it through Model C,and we get our final performance between these two, our output and our actual label. And our test setis used to check how generalizable the final chosen model is. So that means how well can ourfinal model perform on data it hasn't seen yet. All right. Let's talk about this loss function.Loss is basically a metric of performance, it tells our computer, hey, this is how well we'redoing right now. And so if our prediction is further from the output, the greater the losswill be. So for example, this pink one, it's a slightly higher loss than that brown becauseit's further from that chocolate bar, right? Or if we have this blue, then that's really far fromthe chocolate bar, and we have a lot of correcting to do in our model. So these are what's known asloss functions. So for l one loss, basically, we have this absolute value shape. And l two loss,we have this quadratic shape, let's talk a little bit about what these actually mean. So here, the xaxis is the difference between the predicted and the actual value. So if we go back to, you know,you know, these, this is talking about how far apart are these values. So the further from zerothese are, the more different they are, right? And why over here, this is the penalty. So this ishow much are we getting like penalized for being further away from our actual value. And you'llnotice if you look at the scales here, l one is linear, right? So that means that if I'm like 10away from my actual value, then my penalty will just be 10. What this l two loss, on the otherhand is quadratic. So that means if I'm 10 away, then I'm actually penalizing by 100. Right? Sothat's, I mean, that's, that's literally x squared. Now, that also means that the closer I am, sowithin one away, this absolute value of one away from the actual value, then my loss is going tobe less penalty than l one. But anyways, what I wanted to clarify here, let's not get too deepinto the math, it's just that there's two different types of loss functions, which will tell ourcomputer, hey, this is how far off we are. There's also this thing known as binary cross entropy loss.So this is for binary classification. Oh, yeah, I should clarify that these l one l two losses arefor regression. So that's when we're trying to predict a final output. And we're telling ourcomputer, hey, we're kind of far from that output. Now, binary cross entropy loss is when we havebinary classification. So we have two different categories that we're trying to classify. Andwe're not going to try and understand this equation. But what we do want to take away fromthis is that as the loss decreases, the performance will get better as our model does better atpredicting the right output, this loss decreases. Okay, so metrics of performance that we can assessthat will tell us how good our model is performing. The first one is accuracy. So for example, overhere, I have a bunch of different fruits, right? I have apple, orange, apple, apple. And our labelsare over here. So our actual labels are apple, orange, apple, apple. Now, let's say that theseget passed into our model. Well, our model comes up with these predictions, apple, orange, orange,apple. So what is the accuracy of our model here? The accuracy of this model is three quarters,it's 75% because we've gotten three of the four correct. And generally, for now, for thisintroduction, we will be only sticking to accuracy in order to talk about how well our model isperforming. Finally, let's talk about the model itself. In this video, I'm going to focus onneural nets just because it's a very common example and very powerful example of a model. Butthere's many, many different models out there. So neural networks look something like this, we haveour inputs. And so these are all of our features. I know that I showed you guys previously with thefeature vector horizontal, but now let's take that horizontal and let's turn it so it's vertical. Soif we have, you know, some sort of age, glucose level, I forgot what the Pima Indian data set was.But if we have like age, glucose pregnancies, then now it's going to be age, glucose pregnancies,okay, and this is going to be our feature vector, it's going to be this way. These values will getpassed into these hidden nodes. And what does that mean, we will dive into nodes just in a second.And these outputs of these cells will get passed into an output. And this output, these outputcells will determine what the final output of this neural network looks like. So as promised,let's take a look at these specific nodes. Okay, so again, here I have my feature vector, butvertical instead of horizontal. So this is x0 x1, all the way until n. So basically, we have n andthen plus one more, but n plus one different features over here. Okay. Now, all of thesedifferent values, they all get some sort of weight attached to them. So that might mean, okay, wewant to emphasize x0 some more. So let's double the value of that. So this weight might be two.Or, hey, we don't really like, you know, x1 doesn't seem that important. Let's decrease thesignificance of that this weight might be 0.5. Okay, so essentially, we're multiplying this w withthis x. And now the value the output of that goes into this neuron. And this neuron is just takingall of these, like products, and summing them together. On top of that, it gets something calleda bias, which you can think of as like an x intercept. Basically, this is just saying, okay,add this specific number, whatever this number represents, add this number to the neuron. Andthen finally, the output of the sum of all of these, plus this one, gets passed into somethingknown as an activation function. And that activation function, then whatever output of that,that is the output of a single node in our neural net. Okay, and then you have all these nodes,you can chain them together. So I know that I use circles to represent this. But this entire thingis basically encapsulated in its own circle. So that represent like this entire circle containsthis entire thing, right? So each of these circles has their own outputs that go into differentnodes, that, you know, it's doing the same thing, it's taking the product, then the summation. Andthen that gets output into like some sort of output layer. And here, this is the final outputof the neural network. I kind of lost over this activation function. So let's go a little bitdeeper into that. Let's talk about that. Okay, so the activation function, if all of these werejust a product and a sum and a little bit of something added, then we get an output that'sjust a summation of products, right? And when we chain all those together, so without activationfunctions, this basically this entire thing just collapses into a linear model. And by linear model,what I mean from that is just we could have one coefficient per input. So we multiply those twotogether, and then we add a bias. And that would be our output, it would be essentially, if all ofthis entire network could collapse into one cell, which if we don't have activation functions,that's kind of what happens. So we do need activation functions. Now, what are they? Okay,these are three examples of different types of activation functions that we can use. So this isa sigmoid function, this is a tanh function, and this is a relu function, rectified linear unit orsomething like that. I can't remember what this stands for. What this actually means over here,it just means, okay, what is the output of the cell? The y axis here is the output. So whatever,you know, this output here is this y axis over here. And with sigmoid, essentially what we'redoing is we're taking the product and the summation of all the products. So all the different thingsthat come into this neuron here, so all of this stuff that gets summed up, whatever the value ofthat is, is this x axis down here. Okay, so actually, with all of them, whatever that valueis becomes the x axis. And we just want to map that to whatever the y value is. And that willbe the y output of a single cell. And again, the reason why we use these activation functionsis so that we create some sort of nonlinear, we introduce some sort of nonlinear valueinto our neural net so that it doesn't all collapse into a linear function.So that's the importance of activation functions. Typically, I will use relu because that is justa very classic activation function that is known to work. All right. Now, how doesthis part the training actually work with a neural net? Let's talk about that box for a bit.Okay, suppose that we're using our l two loss function. Okay, so again, that's this quadraticvalue. And remember that these x axes are how far is our predicted value from our actual value?How far is that we can calculate a numerical distance for that? Then this y value is, okay,here's the penalty for being that far away. Now, if you're really, really close together,then that penalty is going to be really small. But if you're like, pretty far, that penaltyis going to be even larger. So up here, the error is really large. And our goal is to decrease thisloss. So our goal is to get somewhere down here, right, like ideally zero, because that would meanthat we have a perfect prediction. But we want to get just close enough. Now, that means that wehave to take the giant step in this direction in order to get to that value. And in order to dothat, we can use something called gradient descent, which is a heavy mathematics concept that willalso include some calculus. So we're not going to cover that in the scope of this course. But let'skind of just get a general sense for what gradient descent means. Basically, our gradient descentmeans okay, this is where we are. And so what is like the slope at this point? So at this point,the slope looks something like this. Down here, the slope looks something like this. And down here,the slope looks something like this. So basically just means like at your specific point, how muchare you changing? So that's your slope. Okay, so now once we have that value for the slope gradientdescent is saying, okay, well, let's follow that slope down, because we want to minimize our value,right? So we're just going to follow gradient descent down towards wherever it's going. So ifwe take a look at these different, like w values, because these are the ones that we can adjust inorder to train our neural net. So these remember are what we're multiplying with our inputs inorder to get that summed output, which then goes into the activation function. So we're taking alook at our w's, which are our weights. And maybe, you know, with our current model, with this weight,it's super far off from where we want it to be. And so what we can do is we can calculatethis, like trajectory, and we can say, oh, we want to take a step this way. Now, in w one,we might be slightly closer, and we want to take a smaller step in this direction.And finally, you know, another one of the random weights might be even closer. And so then we onlytake a smaller step in this direction. But essentially, what backpropagation does whatthis gradient descent does is it's telling us, okay, we want to take a step in this directionin order to correct our weight, how much of an adjustment do I make? And in what direction?direction. Now, in order to get that new weight, I'm taking the old weight, and then I'm justtacking on some alpha. So just some like very small value, that's what alpha represents somevery small value, we call that the learning rate, but some very small value times this step. Okay,so basically, our new value will be this old one, and then adjusted slightly by this. And this,as I mentioned, it's called the learning rate, this is typically a really small value, and it'sjust to make sure we don't like overshoot. And so with all of these different weights,we can take this value, and we can multiply it by our learning rate, and then calculate out thisnew value based on the old one. And you'll see that this magnitude of this vector is smallerthan the one over here, which means that we're going to make a smaller step. And if you want toget really technical, then sometimes you'll see this as negative alpha times the slope. And that'sonly because or like the derivative, that's only because here, I'm I'm don't get too caught up inthat. The whole point is we want to take a step down towards our goal down here. And here, I'vealready flipped the signs. So that's why there's a positive if you're wondering, if you had no ideawhat I just said, don't worry about it. Just get the general idea of this. And this is how liketraining neural networks, we adjust our weights based on how far we are from where we want to be.Okay, so the moment we've all been waiting for, let's talk about convolutional neural networks,otherwise known as CNNs. So here we have this handwriting in an image. And the whole goal ofour let's say that, okay, so let's say that we are building a model to detect what number inhandwriting is written in an image. So here we have an image where, to us, our human eye, we say,okay, I think that's a five, right? But our computer doesn't know that. How do we get ourcomputer to determine that using supervised learning? So that's where CNNs, convolutionalneural nets come in. Basically, the idea is we want to somehow pass this image into a neuralnetwork, what we just talked about, in order to produce some sort of output prediction for thenumber that's actually in this image. Well, this image doesn't really translate into a neural nettoo well, right? Because think about the Pima Inions data set, we had different values in eachcolumn, which means that every single sample, so this here, this five would be technically onesample, every single thing in our data set had like, this very nice vector of values to associatewith it. But here in our image, we don't really have that nice vector, right? It doesn't reallypass into this neural network too well. So how do we solve that? And the answer is using convolutionalneural networks. So this area here, which we'll talk about, this is our convolution part. Butessentially what we're doing here is we're trying to extract features. What that means is that we'redoing all these operations so that we can get this input, boil it down to a vector that's easilypassed into a neural net that can actually perform the classification. So all of this that we'regoing to learn in convolutional neural neural network is just tacked on to the beginning ofa neural net. So we can take an image as is pass it through all these different layers.So we can finally produce an output vector for that input, and then pass it through theconvolutional neural network. Okay, so something to notice is that images are actually numbers.So here I have an image of this x and you'll see that certain pixels are darker than the others.But our images are composed of arrays of pixels. So basically, we have this like 2d matrix, right?And each matrix, each cell has some number associated with it. The darker it is, the closerto zero it is. And the lighter it is the closer to 255 it is. So white is 255 and zero is black.And now you know, something that's gray might be in the 100s. But essentially, this maps tothis over here. And now once we have this 2d array, we're going to pass something called aconvolution over it. So a convolution, by definition is a mathematical operationon two functions that produces a third function. What that means in our image processing worldis that okay, here we have some sort of input 2d array, which represents our image, right? Becauseour image again, is this 2d array, we have this 2d array. And then we have some sort of filteron that 2d array, right? So this filter is going to take these cells in its size. So like thisfilter is three by three. So it's going to take cells in a three by three grid on this input map,it's going to do some sort of operation, most likely summing up everything,and then mapping it to something on an output map. So it might take like, so basically, whatyou're going to do is take this thing and keep sliding it over every single possible three bythree window on our input, project that onto this output. And that's a convolution. So here's alittle example. Okay, so this is our filter. And we're just going to overlay this on each part ofthe matrix. These values are just extensions of this, because we don't have anything to fill it.But we're just going to keep sliding it over, summing all of these up, you can see the summationcalculation down here, projecting it onto this output matrix over here. So we do that for allthe different rows. And then we finally get this value, this output value, based on this convolution.Now, there's many different types of things that can come out of this filter, also known as akernel. So here's one example, this is some sort of edge detection on this original input. So thisedge detection will give us all the edges in the image, that is the convolutional layer of aconvolutional neural net. Basically, what we're doing is we're taking our image, we're trying totoggle these filters so that they're able to extract some sort of meaningful information fromour original images. So like here, these are edges, but like, in some cases might be looking for eyesin the image, right? And it's going to take it's going to make a filter in order to detect theseobjects and map it onto an output image. So that's a convolution and our convolutional neural net,we're essentially training that filter to be able to detect these important properties.Now, another important concept is pooling. So if we go back to our CNN image, you'll see that wehave all these convolutions, and then we pool them. So what does this pooling mean? Okay, sopooling is essentially taking a larger input, and reducing it down to something smaller that mightstill represent the data in that original output. So if we use two by two pooling, basically whatthat's doing is taking the two by two cell in the original input, taking the largest, because it'smax pooling, it's taking the largest value in these four cells, and projecting it down here.Over here, 184. So it's 184. And this two by two array, it's 12. So 12. And then over here, it's45. So it's 45 down here. There's also a different type of pooling called average pooling, whichinstead of taking the maximum, we'll just take the average of the four values in the cell andproject that. So what you need to know from pooling is just we're taking a larger array,we're condensing that information down into a smaller, more compact array. And so then we putthose two together, and that becomes our CNN. Often, we usually have more convolutional andpooling layers. And then we also have more fully connected layers. But this is a good schematic ofwhat's going on. This is a visualization that I got from this website, I will post these slidesand the site in the description. But essentially, here, it's really cool, because what I did was Iessentially drew this five. And this is a first convolution, this is what happens after the firstconvolution. This is what happens after the first convolution. Essentially, it's taking this fiveand projecting it towards many different onto, you know, different kernels and such. And thenit's taking these and pulling these. It's doing another convolution on these pooled layers.And then pulling those once more, flattening it into some sort of, you know, linear thing thatwe can put into a neural network that goes into this neural network. And then the output of allof these cells will tell us what value it actually thinks it is. So let's click on that link. Thisis something of what it looks like. If I draw a one, you'll see that the first guess is one andthe second guess is zero. And each one of these cells on here is taken from something down here.So like this specific cell is the output of these cells over here. Okay, so all of theseare some sort of filter. So it actually shows you the filter if I click on it. So it takesthe input image perform some sort of filter on that area, and then gets this value. So here,our input, while we see this like grid, and we see these two blue cells in our input,and the filter, we use that filter in order to get this blue cell over here. And you'll see thatthese are the pulling layer. So it takes multiple cells and then pulls them down to only like tosmaller array. And then we perform some sort of convolution on these. And this might be the inputof multiple different images. So like here, we're taking four images, and we're getting this cellover here, right? So we have all these inputs. And these are the filters on all of those. And it getssome sort of output based on that. Whereas here, you know, our inputs are filters, you get thepoint. But basically, it's like, it's just taking all these inputs, putting some sort of filter onit. So we're summing up all these different things. And then we have some sort. So they're using atan function for their for their nonlinearity, that activation function that we talked about.And finally, this, like all of these, again, pooling, each of these cells is some sort of valuefrom derived from all of these pooled values. And then finally, this is our fully connected neuralnet. And then there's an output layer, which is over here. So this is our output. And over here,it's saying the one is lit up, saying, hey, we think that this is a maximum value, which meansthat this is what we think is our prediction, you'll see that zero is negative point nine, six,seven is negative point nine, nine. Well, we think that this one is the max. So play around withthis. This is a really cool, let's do like a four. Okay, so our first guess is for our second guessis one. And you'll see that floor floor over here is lit up. The big picture here is that we'retaking this input, we're projecting it multiple times, we're doing some sort of like filter overeach like grid, mini grid in this original input. And we're just doing a bunch of like,array filtering mechanisms, basically, putting that into a this layer here, which is essentiallyour feature vector for this image. And then putting that through a neural network in orderto get our prediction. Okay, so we don't actually have to code all that, we just kind of have tohave this understanding of how it works. Because the beauty of machine learning is that there aremachine learning libraries, where we can actually implement our models. So what that means is thatwe have this model that we want to implement in machine learning. And it might look somethinglike this. But what we can do is we can replace it with something that has already encoded allthe different like, mathematics concepts behind each of these layers. And we can simply say,okay, it's a sequential model with, you know, two fully connected layers of 16 units usingrelu activations. And then we have this output. And that is the beauty of TensorFlow. So TensorFlowis an open source library that helps us develop and train machine learning models. And in thisnext example, let's actually take a look at how we can use TensorFlow and train a convolutionalneural network on different images of food in order to produce predictions of whether or nota food is a hot dog. Now that we've learned about the basics of machine learning, neural networksand CNNs, let's actually look at an example and see how we can use TensorFlow in order to builda model, train a data set and use this model to actually evaluate outputs. So I'm going to go tocolab dot research dot google.com. And I'm going to start a new notebook. And this notebook I'mgoing to title hot dog versus not hot dog example. Because today, what we're going to be classifyingis pictures of food. And we're going to be labeling them as hot dogs, or not hot dogs.So let's dive into this. All right, the first thing that we want to do is import a few differentlibraries. So we can import NumPy as NP import pandas as PD import matplotlib.pyplot as plt.Let's import random, we can import import TensorFlow as TF. And from TensorFlow.keras, we're going toimport data sets, layers and models. And then finally, we're going to actually get our datafrom TensorFlow underscore data sets. And we're going to import that as TFDS. So basically,these are some data analysis tools that we have. PLT is a plotting tool. This is just therandom library. So if you need a random number generator or something, TensorFlow,that should be Keras. This is more TensorFlow stuff. And then this is finally the data setthat we'll be looking at. So go ahead and run that cell, you can run it by clicking this play button,or you can use shift enter, and that will run a cell. So in this video today, we're going to bebuilding a hot dog versus not hot dog classification model using TensorFlow to distinguish hot dogsfrom not hot dogs in food images. Alright, so now I'm going to add a citation. And you don't have toworry about this. But this is just so that we can give credit to the people who actually producethis, this data set that is. Alright, so let's talk about the data really quickly. So TensorFlowalready has the food 101 data set. So we'll actually use that. And you can learn more aboutthat by clicking this link here. Now, in this data set, the hot dog label is label number 55.So let's take a look at this data set itself. Okay, so the first thing that we need to do isactually import this data. And we can do that by using TFDS. So that, again, is our TensorFlowdata sets over here. So if we do this dot load, and then we use the string food 101,then that actually TensorFlow will load the food 101 data set. And we're going to shuffle the files.And because we want this as a supervised data set, we're going to set this to true. And we'lljust also include info in that as well. And here we this will actually return a tuple with the dataset as well as the data set info because we have chosen to include that over here. Alright, so we'regoing to run this cell. And this cell will actually take a little while to run. So we'll just sit hereand wait. And you can pause the video until that cell finishes running. Alright, so finally, ourdata set is loaded. So the first thing we're going to do is actually split up the data set into trainand validation sets. So here, I'm going to get a train data set and a validation data set.And this is because this data set automatically comes with train and validation data sets splitup. So I'm going to grab those. And then let's actually show some examples. So what I'm goingto do is I'm going to use TSDS dot show underscore examples. And what I'm going to do is actuallypass in the train data set and then the data set info, which contains all the different information.So if I hit enter there, we get, you know, pizza, chocolate cake, bruschetta, waffles, etc. Andsomewhere in there is hotdog. And actually hotdog, the label for that is number 55. I found that Ithink by just running this a bunch of times and seeing where hotdogs came up. One thing that youmight notice here is that these images are all different sizes, right? They're not all square.So the first thing we want to do is actually we want to resize it. And then the second thing,you might notice that we have a bunch of different labels, right? But what we're interested inis hotdog versus not hotdog. So what I'm actually going to do is I'm going to take an image,actually, let's use this one, this chocolate cake image over here, I'm going to resize it so thatit's 128 by 128. And then pixels, and then I'm going to cast the label, such that it eitherequals a hotdog. So that's one, or it doesn't. So that's zero. Okay, so the maximum side length thatI want here is going to be 128. Then the hotdog class is equal to 55. So what I'm going to dois I'm going to take this training data set. And I'm going to use a map function. So what this mapis doing is saying, all right, whatever function is in this map, that's how we're going to,we're going to apply that to every single item in our data set. And we're going to transformthe data set that way. So by this lambda, well, this is basically our new function,lambda means, okay, what are the inputs, you have an image and a label in each data set,okay, so you have this image, and you have this specific label in each data set. Now, with that,we're going to get a new tuple. So our new image and our new label, and that new image,we're going to use TensorFlow image resizing. And we're going to pass in the original image,and then the actual size that we want. So this would be max side length by max side length.So basically, we are resizing this image, the original image into something that's a squarepixel 128 by 128 pixels. And then that's going to be the first thing in our tuple in our data set.Now, for the label, what we're going to want is actually whether or not the label in thispass in label over here, whether or not that equals the hot dog class, right? So this becomesour new data set, we're basically resizing and then our label is just true or false, zero or one,I guess true is one zero is false. And that's just whether or not it's equal to the hot dog class.Now, one other thing is that TensorFlow will complain about the the type of data that's heldin here. So one thing that we're going to have to do is TF dot cast. And then we're casting thisoutput into TensorFlow int 32 outputs. And the same thing down here. Cast, we're going to castthat into TensorFlow in 32. All right, did we close all of our parentheses? Let me just doublecheck. Okay, yeah, it seems like we did. So then we're going to do this exact same thing on thevalidation data set. So here, now we're, again, mapping the image into something that is a square,and then we're casting the label into whether or not it equals a hot dog class. So let's run thatcell. And now we have our training and validation data set. And let's just verify to see if thatworks. So let's just show these examples. All right, now everything looks like it's a square.And all of these say apple pie, even though it's not really apple pie. And that's just becausein this data set info, it's still 01. But this the zero just means that it is not a hot dog.As you can see, all of these are not hot dogs. Zero is just apple pie is just the default labelfor zero. But what zero means in our context is that it's not a hot dog. So if we actually refreshthis enough times, we'll get maybe one example with a hot dog. Let's see. Yeah, okay, so I guessthe default label for one is baby back ribs. So in our specific example, like, apple pie means nota hot dog. And then baby back ribs means hot dog. Okay, so essentially, you'll just see that anythingthat's not a hot dog is zero and something that is a hot dog has a label hot dog is a one. So now wehave our data set in these square images, and they're all the same shape. And then we also haveour zero and one label. Okay, so one thing that we're going to have to do is there's actuallyonly 750 of each food in. Okay, so something that we're gonna actually have to do is rebalance ourdata set. So in our training data set, there's actually only 750 hot dogs in like the entiredata set, there's actually 750 of each different category. But the limiting factor here is the hotdogs. So let's try to reshape this so that we don't have a bajillion not hot dogs, and then onlya tiny little bit of hot dogs to train on. So here, the training hot dog size, and the validation hotdog size, I'm going to set this equal to 750 and 250. I just know that from like the documentationof the data set. Now, if I get train hot dogs, what I can do is take this training data set,put a filter on it. So whatever, I don't really care about the image.But I'm basically filtering this by the label being equal to one. Now, we can also train thenot hot dogs. And this is equal to again, this image with the label, and the label now being zero.So this is going to significantly outweigh this one. So what I'm actually going to do is I'm goingto up sample this by just repeating this three times. So I'm going to take that 750 and duplicatethat a few times to construct a new data set. And I will actually do the exact same thing for validvalidation. So our valid hot dogs will be taking the validation data set and running the exactsame operation on it. Okay, cool. Now we split our data set into hot dogs and then not hot dogs.But the issue is that the not hot dogs data set like outnumbers the hot dogs data set by a lot,even though we repeated this hot dog data set. So how do we actually sample from each oneand get a balanced data set. So that's what we're going to do here, we're going to takewe're going to create this new training data set by using this function that we need to get. Soin TensorFlow dot data dot data set, we can call the sample from data sets function. And here,we're going to pass in all the data sets that we want to sample from. So for training, that wouldbe train hot dogs and train, not hot dogs. And the weights for each one. So the weights here,we want 50% from here and 50% from here. So I'm going to do 0.5 and 0.5. Okay. And then finally,we want to tell this that once we've reached an empty data set to stop, so there's this optionstop on empty data set, oops. And we're going to set this equal to true. Okay. Okay, so now to getthis training data set into something that we can actually pass into our neural net. What we'regoing to do is we're going to cache, then batch, then prefetch. And I'll explain that in a second.So let's just type that out first. So first, we're going to cache this. And then we're going to batchit by some batch size, which I will actually set up here to be 16. And then we are going to prefetchusing the parameter auto tune. And now I'm also going to just paste this for validation.Oops, valid. Here, this should be the valid data set. All that still applies. And instead of thetrain hot dogs and not hot dogs, instead, this is valid. Okay. So what caching means is that we'regoing to cache the data set somewhere. So that means we're going to save it either in memory orlocal storage. And that will save some operations such as file opening, and data reading from beingexecuted at the beginning of each cycle of running the neural net of training the neural net, sorry,we batch our data set, because instead of passing just one image at a time into the neural net,we can actually pass a bunch in and in this specific case, our batch size is 16. So in thiscase, we're actually passing in a whole batch of images into the neural net, so that the neuralnet can train it and then using all 16 of the images in the batch rather than just every singleimage and training on that. Now, finally, this prefetch over here, this is simply to save moretime, it overlaps the time that the pre processing of the data is taking place, and the modelexecution of the training step. So basically, while the model is training, step s, the inputpipeline is actually already reading the data for the next step. So this will just save us a littlebit more time. Okay, so now our data pipeline is ready. And what we can actually do, let's run this.Alright, so now just for the sake of proving that this data is in the format that we want it.What I'm actually going to do is iterate through just like the first item in the this trainingdata set. So I can do that by getting training data set dot take one. And so we can get the imagebatch and the label batch. And let's print out the image batch. And let's print out the label batch.And I need an in here for my for loop. Okay, so you'll notice that this image batch is kind ofoops, useless, because it's just a bunch of tensors. So there's actually 16 images in thistensor. And the reason why I can confidently say that is because when I go down to the label batch,somewhere down here, it actually gives us a shape 16 vector of the different labels in zero in oneform. So here zero, again, means not a hot dog. One means it is a hot dog. And look at how wellbalanced this is, there's approximately ish, like 5050 in this output, right? So that means that wehave a decently well balanced data set. And with that, we can actually start the neural network.Oops, let's use text for that. We can start the neural net implementation.Alright, so I'm just first going to seed this so that certain results are reproducible.But we want a sequential model. So I'm going to say models dot sequential. And now this is wherewe're going to start building our CNN. The first thing that I'm going to do is I'm going toactually rescale the images so that we divide by 255 for each pixel. And now what that does isinstead of the scale of zero to 255, for all of our colors, we now get zero to one. And one justmeans white, zero means black, or like if there's color for RGB or not. Now we're ready to add someconvolutional layers. So here, the next thing that I want to add is layers.com 2d. And it's going tofirst ask for how many different filters do we want. So let's just put 128 for that, our kernelsize, let's make this three by three. So that's the size of that filter that we're moving acrossthat image. And then let's use a relu activation function, just because it's a classic. And ourinput shape. So our input shape here is equal to max side length, max, side length, and then threebecause we have colors. Alright, then we want to actually add a max point layer. And we'll justmake that two by two. That's just pretty standard. Again, we'll add another convolution layer. Andnow instead, here, this will just be 64 filters, we're still going to use relu. And we actuallydon't have to pass in this input shape anymore. Now, let's add again, another max pooling layer.So actually, I can just paste that. And I'm going to go with another convolution layer. In order toget from the convolutions to our fully connected, we actually have to add a flattening layer. Sohere, I'm going to flatten that. And let's just add a dense layer. So that's just a fully connectedlayer. dense 128. So that means like each input goes to every single node. So 128. And let's makethe activation again for this relu. And finally, the output of this because it's binary, becauseit's either zero or one, I only need one node at the very end that will tell us what it is zero orone. Okay, so this is the general gist of what our neural network will look like. So let's createthis. And let's get started on training. So our learning rate will be 0.0001. And let's compilethis model. Okay, so a very classic optimizer to use is Adam. And so that's what we're going touse here, you can kind of think of this as a tool to help us adjust the different weights,just like in the diagram that we showed earlier to go down that gradient towards that minimal loss.And our loss here will actually be something known as binary cross entropy. So we will uselosses dot binary cross entropy. And the reason the reason why we use binary cross entropy isbecause we have only a single output and we're trying to do binary classification. So wheneverit's binary classification, we use binary cross entropy. And here we have this from logits option.So the reason why we set from logits, we actually have to set this from logits to be true,because our final layer does not automatically project the output onto zero to one, which wewould do using a sigmoid function or something like that. So down here, we have to set from logitsto true to let our, I guess, loss function know that it's not already projected from zero to one.And finally, the metrics that we want to use to assess this might just be accuracy.Okay, so let's compile the model. Oops. Oh, okay, so I'm missing an S over here. That should beaccuracy metrics equals accuracy. Okay, cool. So now my model is compiled. And I'm just going toset epochs equal to 50. And we'll collect the history from here. But we're essentially goingto fit this model to the training data set, the validation data will equal the valid data set,epochs will equal this epochs parameter that we pass it. And then we're going to set verbose toone so that we can actually see things get printed out. So let's run this. All right,this will take a little bit of time, but you see that the accuracy starts off not so great,it started at like point three or something. Okay, now it's at 0.5. So the thing is,we expect just a random model to be at an accuracy of 0.5, because we have equal parts of the hotdogs and the not hot dogs in the in the data set. So we do expect initially that the accuracy wouldbe around something like 50%. So our goal is to see that if we can improve that.Okay, so now now our model is done training, let's go ahead and take a look at the results.Here, this is each epoch. So essentially, an epoch just means an iteration where we go throughthe entire training data set. And we see that the loss each time as we train our model is decreasing.So this means we're getting closer and closer to our goal of matching up our predicted labels withthe actual labels. And you can also see that reflected in the accuracy. So we start at somewherearound 50%, which is expected because our data set is comprised 50% of the hot dogs and 50% ofnot hot dogs. So initially, we would kind of expect it to be randomized. So our accuracy at first is50%. And we see that it increases. And it actually does fairly well. After a while, it gets to anaccuracy of 100%. Now, let's see if our validation data tells a different story. So remember, thevalidation data is not actually data that we pass into the training, our validation data is kind ofthis data that we set aside to evaluate how our model is performing in the process. So if I comeover here, we see that our loss is starting to decrease. So then it actually kind of goes backup. And we also see that with our accuracy, okay, it starts to go up to maybe about 73%. And thenit starts to decrease again, and it kind of stays steady. And our actually our loss also increasesup to around 2.93. Even though our training loss is extremely small, and our training accuracy is100%. So what is going on here? Chances are what we're doing is overtrading our model. Essentially,what that means is that we've passed so much data into this model so many times, that the model hasactually remembered each piece of data. And that model, when it remembers each piece of data, itcan actually predict all of your training labels 100% correctly. But when it sees new data, it can'treally generalize. And so that's why you see this decrease in accuracy and loss. Alright, so whatwe're going to do is actually go back in this model and see if we can make any changes to make thismore robust. Now, one thing that we can do is add something called dropout layer. So after each maxpooling, I'm going to insert a layer called dropout. And I'm going to set this parameter to point tofive. What this is doing is saying at, you know, just randomly, during the training, we're goingto turn off 25% of the connections in between this layer, sorry, this layer, and this one. And so byturning off these connections, you're essentially training the model to be, I guess, susceptible toa little bit of randomness, you're trying to train your model, so that you, you know, might not havethe exact same features every time, but you get the same output. So I'm going to go ahead and addthis to some of the layers. Okay, so now I'm going to do this before we flatten. And that's going tobe my dropout layers. One more thing that I wanted to add to this model before we actually ran it wassome data augmentation as well. So I'm going to insert some code up here, and call this dataaugmentation. So here, I'm going to use tf.keras.sequential. This is going to be our dataaugmentation layer, which means that when the data when the image is passed into the model,we're actually going to perform some operations on it. So here, what I'm going to do is first pass ina random flip layer that will flip it along the horizontal axis. And then I'm going to pass ina random rotation layer. And then I'm going to pass in this random rotation and use the factor0.2 to randomly rotate within the range of negative 20% times two pi and then positive 20%times two pi. So basically, when our image is passed in, after it gets rescaled, we can addmodel dot add this data augmentation layer, which will essentially perform these operations on ourinput data. Alright, let's take a look at how this data augmentation function actually does to ourdata. So I'm going to actually try to extract the first image from our original training data set.I'm just going to get that first image. And then let's actually show that image. Oh, no,that's not what I wanted. I wanted to take the first one. So I have this dot take there. Okay.And then let's show that image. Alright, cool. Let's run this again. Let's get a more square image.Okay, cool. So we have this like ramen over here. So now what I want to show you is what dataaugmentation does and how this is actually being rotated. So something that I'm going to do is I'mgoing to cast this to a batch so that we can actually put it through our data augmentationlayer. Now what that means is I need to expand the dimension. So essentially, I just want a listwhere the only thing in the list is my image. In order to do that, I need to use expand dimensions.And I'm passing in the image, and I'm just going to expand it along the very first dimension. Sothat's what the zero is, it just saying, okay, like put this into an like a list holder. AndI will actually cast this as well into float 32. And one more thing is that this currently holdsvalues from zero to 255. But when we actually show the image, it's going to expect from zero to one.So I'm just going to right here divide this by 255. Okay. And now we can finally show what thefigures would look like with the data augmentation. So if I create a figure with fig size equal to1010 or something, I'm just going to plot this with nine rotations. And I can create an augmentedimage and call my data augmentation pipeline on this image. And now let's just plot that. So let'sdefine the axes for the subplot. And it'll be a size three by three. And then I'm going to showthis on the figure. So let's do the augmented image. And remember that the augmented image isa container that holds our image. So I actually have to index into that in order to get theimage itself. And I'm just going to turn off the axes. Now if I run this, let's see what happens.Okay, so you see that we actually get different rotations of our food and maybe you know, someflips here. But essentially, it scrambles our data set a little bit, which is exactly what weneed in order to build robustness, so that we don't get the same input data every single time.So I'm just going to add that as a layer right here. So let's run this, run this. And so essentiallywhat I've done is I've added this data augmentation to my input data. And then I've also created theselayers of dropout that will allow us to train with certain nodes and connections missing at,you know, a given training interval. Alright, so one change that I'm actually going to makefrom this original model is I'm just going to move this dropout down here. So as far as I know,when you're building your model, they're, you know, general architectures work well, but there's nogood answer for where should you place these, you know, layers? How many nodes should you have inthese layers? How many? What should the stride length be? A bunch of different questions that,you know, what activation function do you use? These questions are things that are kind of almosta little bit trial and error from what I understand. Essentially, a lot of times what people will dois simply just try to go through and try a bunch of combinations based on stuff that's worked inthe past and see, you know, what will produce the best outputs for you. Alright, so one more thingthat I wanted to add to this was actually a kernel regularizer. And what that does is just it limitshow much we can change the weights of a given kernel. So how I would do that is I would justpass in this kernel regularizer parameter here. And I would set this equal to tensorflow.kerasregularizers. And let's just use an L two, so that we like, create a bigger penalty for biggerchanges. And I will use 0.01 as my hyper parameter here. Now I'm also going to add that down here.Oops, added a period over here. All right, so I added this kernel regularizer to each of these,except for this top one. And just keep in mind that the kernel regularization is meant so thatwe can, the weights of the filters that we're using in the convolutions that we're trainingin the convolutions, those don't change super fast. And for the purposes of this project,I'm actually going to also decrease the number of nodes that we have in each of these layers,just so that trains a little bit faster. So but these are actually parameters that you in yourfree time can go and play around with you can play around this parameter, this where you insert thedropouts, different types of regularizers, different type of activation functions,and filter sizes, etc. So let's run. So let's run this cell. And I'm going to getI'm going to compile the model. And then finally, let's train the model down here.Okay, so now it is a waiting game. And we will just wait. Okay, so our model is finally donetraining, let's take a quick look at the results. So here we go from a training loss of 1.6,all the way down to something around 0.57, 0.56. And we see that the accuracy goes from 0.5 toaround like 70%. So 50% to 70%. And this is remember on the training set. So now if we takea look at the validation set, so remember, this is a data that our model hasn't seen yet,it goes from around 1.2 and also 50% accuracy to something around 60% validation loss or sorry,0.6 validation loss and 70% accuracy. So we have shown that we are effectively training a model.I will say that throughout, you know, when I was creating this video, the best accuracy that Icould get was something around 75%. So for you guys, if you go back, play with some hyperparameters, the layout of the model, different activations, regularizers, etc. And if you havean accuracy that's better than 75%, please do share it with everybody. I would love to learn from you.So what this means is our model can achieve something around 70% accuracy on classificationof hot dogs versus not hot dogs on this image. And now ideally, we would have also had some sortof test data set that we could try this model on. But unfortunately, our data set did not come withthat. And I did not create that. So that's okay. We're going to use our validation data set toactually just demonstrate that this works. So I'm going to paste here the the figure that we use inorder to draw our data augmentation images. And instead of doing this, what I'm going to do isinstead use image batches from our validation data set. So image batch and label batch. Inour validation data set, take the first item, remember, it's a batch already of 16. So theimages that we want to use are this image batch. And the labels that we want to use are this labelbatch. There's probably a better way to do this, but I couldn't get it working. So this will sufficefor now. And then so let's run that. Okay, and then here. So in this range nine, instead of anaugmented image, what I'm going to do is actually just plot the image so that we can see what itlooks like. So instead, we're going to take the images and get the ith image from that. And let'sprint all of these. From me looking at this, clearly, this is a hot dog. I don't think that'sa hot dog. Can't really tell what that is. hummus, hot dog, garlic bread, hot dog, hot dog, oysters,and it looks like sausage and waffle. And now if we look at our labels, one means that our modelthinks it's a hot dog, zero means that it doesn't think it's a hot dog. So for the first row, we getyes, no, no. So we get hot dog, not a hot dog, not a hot dog. And then we get yes, no, yes. So hotdog, not a hot dog, hot dog. And then yes, no, no. So hot dog, not a hot dog, not a hot dog. And soour model actually in these nine images was able to classify all of them accurately. So we wouldliterally be able to pass this image in, ask it, is this a hot dog or not, and it would be able tosay yes, it is, which I think is pretty awesome. That concludes our introductory course onconvolutional neural nets. Thank you guys all for being here with me today learning about the basicsof machine learning, neural networks, how to train them, and finally, convolutional neural networkswith our hot dog or not hot dog example. I hope that you guys learned a lot. And of course,post comments, let's help each other learn and we can all get better at ML together. Don't forgetto subscribe to my channel Kylie Ying, I will actually be releasing a course on artificialintelligence later this year. So stay tuned. Transcribed by https://otter.ai\n"

Hot Dog or Not Hot Dog – Convolutional Neural Network Course for Beginners

Random Videos