Python Tutorial - Measuring model performance

The Importance of Evaluating Classifier Performance: A Guide to Computing Accuracy and Model Complexity

Now that we know how to fit a classifier and use it to predict the labels of previously unseen data, we need to figure out how to measure its performance. In classification problems, accuracy is a commonly used metric. The accuracy of a classifier is defined as the number of correct predictions divided by the total number of data points. However, this begs the question: which data do we use to compute accuracy? We are really interested in how well our model will perform on new data that the algorithm has never seen before.

Using the Training Data to Compute Accuracy

-------------------------------------------

One approach is to compute the accuracy on the data used to fit the classifier. However, this data was used to train the classifier, and its performance will not be indicative of how well it can generalize to unseen data. For this reason, it's common practice to split your data into two sets: a training set and a test set. You trained or fit the classifier on the training set, then you make predictions on the labeled test set and compare these predictions with the known labels. You then compute the accuracy of your predictions.

Splitting Data into Training and Test Sets

-----------------------------------------

To split your data into two sets, we use the `train_test_split` function from SK learn's `more_selection`. We first import this function. Then, we use the `Train_test_split` function to randomly split our data. The first argument is the feature data (X), the second argument is the targets or labels (Y). The `test_size` keyword argument specifies what proportion of the original data is used for the test set. We also specify a random state, which sets a seed for the random number generator that splits the data into train and test sets.

Setting the Seed for Reproducibility

-------------------------------------

We can set the same seed with the `random_state` argument later to allow us to reproduce the exact split and our downstream results. The `train_test_split` function returns four arrays: the training data (X_train), the test data (X_test), the training labels (Y_train), and the test labels (Y_test). We unpack these into four variables for easier reference.

Specifying the Proportion of Data to Use for Testing

------------------------------------------------

By default, `train_test_split` splits the data into 75% training data and 25% test data. This is a good rule of thumb, but we can specify the size of the test set using the `test_size` keyword argument. In this case, we set it to 30%. It's also best practice to perform your split so that it reflects the labels on your data. We want the labels to be distributed in train and test sets as they are in the original data.

Ensuring a Balanced Split

-------------------------

To ensure a balanced split, we use the `stratify` keyword argument, which contains the list or array of labels (Y). This ensures that the proportions of each label are maintained in both the training and testing sets. By using this approach, we can avoid issues with class imbalance in our dataset.

Instantiating and Fitting the Classifier

-----------------------------------------

Once we have split our data into training and test sets, we can instantiate our classifier and fit it to the training data. We use the `fit` method of the classifier to train it on the training data. Then, we make predictions on the test data and store the results in a variable called `Y_pred`.

Evaluating Classifier Performance

---------------------------------

To evaluate the performance of our classifier, we can use the `score` method of the model. We pass in the test data (X_test) and labels (Y_test) to compute the accuracy of our classifier. In this case, the accuracy of our K-nearest neighbors model is approximately 95%. This is a good result for an out-of-the-box model.

Model Complexity: Understanding the Trade-Off

---------------------------------------------

As we increase the value of K in the K-nearest neighbors model, the decision boundary gets smoother and less curvy. We consider this to be a less complex model than those with lower values of K. However, if we increase K even more and make the model even simpler, it will perform less well on both the training and testing sets. This is known as overfitting.

Model Complexity Curves

-------------------------

To understand the trade-off between model complexity and performance, we can visualize a model complexity curve. The curve shows how the accuracy of our classifier changes as we increase or decrease the value of K. In this case, there is a sweet spot in the middle that gives us the best performance.

Practicing Splitting Data and Evaluating Classifier Performance

----------------------------------------------------------------

Now it's your turn to practice splitting your data into training and test sets, computing accuracy on your test set, and plotting model complexity curves. Don't be afraid to experiment with different values of K and see how they affect your results. Remember to use the `train_test_split` function and the `score` method to evaluate your classifier's performance.

"WEBVTTKind: captionsLanguage: ennow that we know how to fit a classifier and use it to predict the labels of previously unseen data we need to figure out how to measure its performance that is we need a metric in classification problems accuracy is a commonly used metric the accuracy of a classifier is defined as the number of correct predictions divided by the total number of data points this begs the question though which data do we use to compute accuracy well what we are really interested in is how well our model will perform on new data that is samples that the algorithm has never seen before well you could compute the accuracy on the data you use to fit the classifier however as this data was used to train it the classifiers performance will not be indicative of how well it can generalize to unseen data for this reason it's common practice to split your data into two sets a training set and a test set you trained or fit the classifier on the training set then you make predictions on the labeled test set and compare these predictions with the knowing labels you then compute the accuracy of your predictions to do this we first import train tests split from SK learn more selection we then use the Train test split function to randomly split our data the first argument will be the feature data the second the targets or labels the test size keyword argument specifies what proportion of the original data is used for the test set lastly the random state Quogue sets a seed for the random number generator that splits the data in to train and test setting the seed with the same argument later will allow you to reproduce the exact split and your downstream results train test split returns for arrays the training data the test data the training labels and the test labels we unpack these into four variables X train X test Y train and why test respectively by default train test split splits the data into 75% training data and 25% test data which is a good rule of thumb we specify the size of the test set using the keyword argument test size which we do here to set it to 30% it is also best practice to perform your split so that your split reflects the labels on your data that is you want the labels to be distributed in train and test sets as they are in the original data set to achieve this we use the keyword argument stratify equals Y where Y is the list or array containing the labels we then instantiate our K nearest neighbors classifier fit it to the training data using the fit method make our predictions on the test data and store the results as Y underscore pred printing them shows that the predictions take on three values as expected to check out the accuracy of our model we use the score method of the model and pass at X test and Y test see here that the accuracy of our K nearest neighbors model is approximately 95% which is pretty good for an out-of-the-box model recall that we recently discussed the concept of a decision boundary here we visualize a decision boundary for several increasing values of K in a K&N model note that as K increases the decision boundary gets smoother and less curvy therefore we consider it to be a less complex model than those with lower K generally complex models run the risk of being sensitive to noise in the specific data that you have rather than reflecting general trends in the data this is known as overfitting if you increase K even more and make the model even simpler then the model will perform less well on both tests and training sets as indicated in this schematic figure known as a model complexity curve this is called under fitting we can see that there is a sweet spot in the middle which gives us the best performance the test said okay now it's your turn to practice splitting your data computing accuracy on your test set and plotting model complexity curvesnow that we know how to fit a classifier and use it to predict the labels of previously unseen data we need to figure out how to measure its performance that is we need a metric in classification problems accuracy is a commonly used metric the accuracy of a classifier is defined as the number of correct predictions divided by the total number of data points this begs the question though which data do we use to compute accuracy well what we are really interested in is how well our model will perform on new data that is samples that the algorithm has never seen before well you could compute the accuracy on the data you use to fit the classifier however as this data was used to train it the classifiers performance will not be indicative of how well it can generalize to unseen data for this reason it's common practice to split your data into two sets a training set and a test set you trained or fit the classifier on the training set then you make predictions on the labeled test set and compare these predictions with the knowing labels you then compute the accuracy of your predictions to do this we first import train tests split from SK learn more selection we then use the Train test split function to randomly split our data the first argument will be the feature data the second the targets or labels the test size keyword argument specifies what proportion of the original data is used for the test set lastly the random state Quogue sets a seed for the random number generator that splits the data in to train and test setting the seed with the same argument later will allow you to reproduce the exact split and your downstream results train test split returns for arrays the training data the test data the training labels and the test labels we unpack these into four variables X train X test Y train and why test respectively by default train test split splits the data into 75% training data and 25% test data which is a good rule of thumb we specify the size of the test set using the keyword argument test size which we do here to set it to 30% it is also best practice to perform your split so that your split reflects the labels on your data that is you want the labels to be distributed in train and test sets as they are in the original data set to achieve this we use the keyword argument stratify equals Y where Y is the list or array containing the labels we then instantiate our K nearest neighbors classifier fit it to the training data using the fit method make our predictions on the test data and store the results as Y underscore pred printing them shows that the predictions take on three values as expected to check out the accuracy of our model we use the score method of the model and pass at X test and Y test see here that the accuracy of our K nearest neighbors model is approximately 95% which is pretty good for an out-of-the-box model recall that we recently discussed the concept of a decision boundary here we visualize a decision boundary for several increasing values of K in a K&N model note that as K increases the decision boundary gets smoother and less curvy therefore we consider it to be a less complex model than those with lower K generally complex models run the risk of being sensitive to noise in the specific data that you have rather than reflecting general trends in the data this is known as overfitting if you increase K even more and make the model even simpler then the model will perform less well on both tests and training sets as indicated in this schematic figure known as a model complexity curve this is called under fitting we can see that there is a sweet spot in the middle which gives us the best performance the test said okay now it's your turn to practice splitting your data computing accuracy on your test set and plotting model complexity curves\n"