Your First ML App - Machine Learning for Hackers #1

Machine Learning: The Key to Unlocking Data Insights

Hello world, welcome to Sirajology! Ever wonder how Netflix recommends awesome shows you'd like? Or how Facebook can auto-tag your face? Or how Google's self-driving cars work? Or how Bing can...whats that? You don't care what Bing does? It's okay, nobody does. Anyways, the answer is Machine Learning. Machine Learning is the study of algorithms that learn from examples and experience instead of hard-coded rules.

To understand how machine learning works, let's consider an example. Imagine you want to build an app that can recognize an image of a specific species of flower called Iris. If you decide to code this without machine learning, you'd have to write a bunch of different functions to detect all the different features of an Iris flower. The problem is, there are a bunch of corner cases and there's no way you could account for all of them ahead of time. What if one of the leaves is partially obstructed or a flower is a certain color that you didn't account for or the shape is totally different than what you expected? You can't just code all that up before-hand! Unless you're Jeff Dean, who's not even sure he could do it.

The good news is that machine learning makes this problem super easy and doesn't require any math skills. There are four steps involved in the process: collect data, pick a model, train the model, and test the model. We'll basically just add data to a model and it will start to find patterns in the data for us.

The first step is to get our data. Datasets come in all different kinds of formats (PDFs, TXTs, CSVs, holograms), it doesn't matter the format, we can easily parse it in our code to get the relevant details. We'll be using a well-known dataset that contains 150 samples of Iris flowers. Luckily for us, this dataset comes preloaded with SciKit learn so we can just load it here. Each sample has a label, one of three types of Iris (setosa, virginica, or versicolor) and a set of features (sepallength, sepal width, petal length, and petalwidth). Because our data is labeled, the type of learning we're doing is called supervised learning. If we didn't have labels for our data, just features, then it would be called unsupervised learning.

Supervised learning means that we have a target or response variable to predict. In this case, our goal is to classify an Iris flower as one of the three types (setosa, virginica, or versicolor). If we didn't have labels for our data, just features, then it would be called unsupervised learning. Unsupervised learning means that there's no target variable and we're trying to find patterns in the data.

Now that we have our dataset, the next step is to pick the model. To do that, you just have to calculate the multivariate equation for discriminant analysis by squaring the delta of the...just kidding – you literally just paste in a single line of code. The real question is how do you know which of the bajillion machine learning models to use? Well, we're trying to classify an image as either an Iris flower or not an Iris flower, so we know this is a classification problem. Therefore, we'll need to use a classifier.

Ok, that narrows our options, but what type? There are a lot of those too! My gut reaction is to use a deep neural network because it just sounds dope you know what I mean? But there are countless machine learning models out there, each with its own strengths and weaknesses. The key is to choose the right one for your specific problem.

One popular choice is a classifier that takes in multiple features and outputs a probability distribution over all possible classes. In our case, we're trying to classify an Iris flower into one of three categories (setosa, virginica, or versicolor). This type of model is well-suited for this task because it can take advantage of the complex relationships between the different features.

Now that we have a classifier, the next step is to train our model. Since we're using a classifier, we just need to call the fit method on our object to train our model. Fit is our training algorithm, this method will input the training dataset into our model find patterns in our data. Boom, done.

Now, whenever we input a new flower from our testing dataset, it'll automatically be able to classify it as one of the three types of Iris flowers. We can see in the terminal that the accuracy score for classification is pretty high. How easy was that? Just 7 lines of code and now you have your first model trained and ready to recognize Iris flowers! You just made a learning machine.

And you can use this same model to classify other things like cars, dresses, and even republican candidates. Machine learning can be applied to so many different things from fraud detection to generating paintings like Picasso. The possibilities are endless, and it's up to us to explore them.

"WEBVTTKind: captionsLanguage: enHello world, welcome to Sirajology! Ever wonderhow Netflix recommends awesome shows you'dlike? Or how Facebook can auto-tag your face?Or how Google's self driving cars work? Orhow Bing can...whats that? You don't carewhat Bing does? It's ok nobody does. Anyways,the answer is Machine Learning. Machine Learningis the study of algorithms that learn fromexamples and experience instead of hard-codedrules. So let's say you want to build an appthat can recognize an image of a specificspecies of flower called Iris. If you decideto code this without machine learning, you'dhave to write a bunch of different functionsto detect all the different features of anIris flower. The problem is, there are a bunchof corner cases and there's NO WAY you couldaccount for all of them ahead of time. Likewhat if one of the leaves is partially obstructedor a flower is a certain color that you didn'taccount for or the shape is totally differentthan what you expected. You can't just codeall that up before-hand! Unless you're JeffDean. Just kidding not even Jeff Dean cando that, no one can. You have to use machinelearning to solve this problem and here'sthe best part -- it's actually super easyand you don't need to be a math person todo it! There are just 4 steps involved inthe process -- collect data, pick a model,train the model, and test the model. We'llbasically just add data to a model and itwill start to find patterns in the data FORus. We're gonna make this iris flower recognitionapp with just 7 lines of Python using twodope libraries; SciKit Learn and TensorFlowwhich we'll import right at the start.So let's do this. The first step is to getour data. Datasets come in all different kindsof formats (PDFs, TXTs, CSVs, holograms) Itdoesn't matter the format, we can easily parseit in our code to get the relevant details.We'll be using a well-known dataset that contains150 samples of Iris flowers. Luckily for us,this dataset comes preloaded with SciKit learnso we can just load it here. Each sample hasa label, one of 3 types of Iris (setosa, virginica,or versicolor) and a set of features (sepallength, sepal width, petal length, and petalwidth). Because our data is labeled, the typeof learning we're doing is called supervisedlearning. If we didn't have labels for ourdata, just features, then it would be calledunsupervised learning.So yeah, these aregood features -- they're simple, independent,and informative as all features should be.By the way, if you're ever deciding on whatkind of features you should look for in adataset, a good rule of thumb is thinkingabout what features you personally would needto figure out to determine whatever your goalis. So if you're trying to determine jedior sith given a dataset of lightsabers, don'tpick something like 'weapon status' as a feature.Use blade colors andcurvature as your features!So now that we have our dataset, the nextstep is to pick the model. To do that, youjust have to calculate the multivariate equationfor discriminant analysis by squaring thedelta of the...just kidding -- you literallyjust paste in a single line of code. The realquestion is how do you know which of the bajillionmachine learning models to use? Well, we'retrying to classify an image as either an irisflower or not an iris flower, so we know thisis a classification problem. Therefore, we'llneed to use a classifi-er. Ok that narrowsour options, but what type? There are a lotof those too! My gut reaction is to use adeep neural network because it just soundsdope you know what i mean? But there are countlessothers! The answer is it depends. It dependson the size of your data and your applicationrequirements. Currently, if you have a LOTof data, deep neural networks pretty muchoutperform every other machine learning modelacross a wide variety of use cases. but Inour case, we only have 150 samples so we'lluse something simple and standard. A linearclassifier. we'll set the class parameterto 3. You can also easily just switch outthe model with another one and see the differencein accuracy. (It's just one line of code,like thats all it would take to use a deepneural net to classify this) That way you'llknow which one works best.Now that we've picked our model, its timeto train it. Training is the actual learningstep, as your model iterates through the dataset,it gets better and better at prediction. Sincewe're using a classifier we just need to callthe fit method on our object to train ourmodel. Fit is our training algorithm, thismethod will input the training dataset intoour model find patterns in our data. Boom,done. Now, whenever we input a new flowerfrom our testing dataset, it'll automaticallybe able to classify it as one of the 3 typesof Iris flower. We can see in terminal thatthe accuracy score for classification is prettyhigh. How easy was that? Just 7 lines of codeand now you have your first model trainedand ready to recognize iris flowers! You justmade a learning machine. And you can use thissame model to classify other things like cars,dresses, and republican candidates. Machinelearning can be applied to so many differentthings from fraud detection to generatingpaintings like picasso. If you'd like to seemore machine learning videos, please subscribe.I'm going to release a lot of them. For now,I've gotta go fix a runtime error so thanksfor watching!\n"