Random Forests - The Math of Intelligence (Week 6)

Using Decision Trees to Predict Credit Worthiness

Decision trees are a popular machine learning algorithm used for classification and regression tasks. In this article, we will explore how decision trees can be used to predict credit worthiness.

**Building the Decision Tree**

To build a decision tree, we start by selecting the most relevant feature from our dataset. This is typically done using a technique called feature selection, which involves evaluating the importance of each feature in predicting the target variable. In this case, we are trying to predict whether someone is credit worthy or not.

Once we have selected our feature, we create a decision tree by recursively partitioning the data into smaller subsets based on the values of that feature. The process starts at the root node, where we select a random subset of the data and compute the gini index for each class value in that subset. The class with the lowest gini index is selected as the next step in the tree.

The decision tree continues to split the data into smaller subsets until it reaches a terminal node, which represents the final prediction. In this case, we are trying to predict whether someone is credit worthy or not, so our terminal nodes will represent one of two classes: "credit worthy" or "not credit worthy".

**Computing the Gini Index**

The gini index is a measure of the impurity in each subset of the data. It is calculated by selecting a random subset of the data and computing the average value for each feature in that subset. The gini index is then computed as the sum of the products of the proportion of each class value in the subset and its corresponding class probability.

The gini index provides a measure of how pure or impure each subset of the data is, with lower values indicating more pure data. By selecting the class with the lowest gini index at each step, we can build a decision tree that minimizes the overall impurity of the data.

**Navigating the Decision Tree**

Once our decision tree is built, we can use it to make predictions for new data. The process starts at the root node and recursively navigates down the tree based on the values of the input features.

At each step, we compare the value of the current feature with a threshold value that was computed using the gini index. If the value is greater than or equal to the threshold, we move to the right child node; otherwise, we move to the left child node.

We continue this process until we reach a terminal node, which represents our final prediction. The decision tree navigates down the tree for each row in the data, using the input features to select the next step in the tree.

**The Predict Function**

The predict function is used to make predictions for new data. It takes in a set of input values and uses them to navigate down the decision tree.

At each step, the predict function compares the value of the current feature with a threshold value that was computed using the gini index. If the value is greater than or equal to the threshold, it moves to the right child node; otherwise, it moves to the left child node.

Once we reach a terminal node, the predict function returns our final prediction. The predictions are made for each row in the data and are summed up to give the overall class probability.

**Random Forests**

Decision trees can be combined to create random forests, which improve the accuracy of the model by averaging the predictions of multiple decision trees.

In this article, we have demonstrated how decision trees can be used to predict credit worthiness. We built a decision tree using feature selection and the gini index to partition the data into smaller subsets based on the values of our input features.

We navigated down the tree for each row in the data, using the input features to select the next step in the tree. Finally, we made predictions for new data using the predict function, which returned our final prediction.

**Testing Our Code**

To test our code, we ran it on different numbers of decision trees and compared the accuracy scores. The results showed that increasing the number of decision trees improved the accuracy score.

In particular, we tested one decision tree with one feature, five decision trees with one feature, and ten decision trees with one feature. Our results showed that increasing the number of decision trees from one to five and then ten resulted in significant improvements in accuracy.

This demonstrates the power of random forests in improving the accuracy of machine learning models.

**Conclusion**

Decision trees are a powerful tool for classification and regression tasks. In this article, we have demonstrated how they can be used to predict credit worthiness using feature selection and the gini index.

We built a decision tree by recursively partitioning the data into smaller subsets based on the values of our input features. We navigated down the tree for each row in the data and made predictions using the predict function.

Finally, we tested our code on different numbers of decision trees and compared the accuracy scores. Our results showed that increasing the number of decision trees resulted in significant improvements in accuracy.

We hope this article has provided a comprehensive overview of how decision trees can be used to predict credit worthiness.

"WEBVTTKind: captionsLanguage: enHello worldIt's SirajAnd how risky is your credit that is the question that we are answering today by looking at this?Credit Risk Data set it is a German credit Risk Data set we want to knowBased on your employment history based on your family history your in you know your income areYou at risk for not paying back your loan?Usually we have people who do this and it takes a long time, and you have to there's a bunch of biasesBuilt into the system you know you meet the person whether you're an insuranceAgent or you work for some bank, and you're trying to assess whether or not this person deserves a loan as humansWe have a lot of biases and these biases don't necessarily addReal value to whether or not this person deserves a loan or not, right?So the way to fix that is to let machines do the job because machines can find relations and data that we can'tSo the data set we're going to look at is a bunch of financialattributes of somebody the status of their existing checking accountThis is a German data set by the wayand I found it on the UCI aWebsite also a that is just a great website to find data sets on so definitely check that website out if you haven't alreadyWe're going to look at their credit history the duration of paymentsThey've made the purchases they made the cars furniture all of these things are features, right?They're all features and we can use that to assess whether or not this personHas is that risk for is that risk for not paying back their loans? This is used in a bunch of different fields insurance ofFinanceWhether or not to rent a house to somebody right the landlord assesses whether or not you can pay pay it backSavings account that whether or not, they're employed all these features and the label isThis is a history of credit risk. So there's also label based on all these features this person has already been assessed asat risk or not at risk right, soThat's that's the mapping that we're going to learn right it's a binary mapping and the way we're going to do that is by buildinga random forest that's really the goal here to learn about random for it andYeah, so basically what it would look like it's something like this like this picture that we're looking at right nowEventually once we built this random forest which consists of several. What's what are called decision trees. It will look like thisWe'll feed it a new data point and then it will iteratively ask a series of questions like is their checking account balance above200 or below 200 and thenBased on that it will ask another set of questions like if you say no if the answer is noBased on that Data point it'll say wellWhat's the length of their current employment, and then are they credible or not creditable, right?So it'll just keep going down to this this chain of decisions a decision tree, okay?So that's what we're going to build we're going to build a random forestSo what is a random forest? Well a random forest is a collection of decision trees, so let's talk about what a decisionTree is so a decisionTree is actually a relatively simple concept and so I just didn't even talk about that in this series so farI'm just going straight into rAndom forest becauseDecision trees are easy stuff. We want to get right into the random Forest part, right?But let's go over decision trees really quickly and so a decision tree isbasically a set of decisions onwhether or not to classify something the technical name for this is theclassification and regression tree or court and it was invented by a dude named Leo Breiman aCouple decades ago like two decades ago, and it can be used for both as the name suggestsclassification andregression both of those thingsbut we're going to use it forClassification because that is that is what we're trying to do or we're trying to classify whether or not someone is creditable or not creditableOkay, so how does this thing work? Well you have a set of features, right?Let's say this features are the temperature the wind speed and the air pressure and based on these three featuresWe want to classify whether or not it's rainyWell what the decision tree that we build is going to do is it's going to createiterativelyIterative ly I should say recursively actually it's going to recurse the tree itself is builtRecursively, you'll see what I mean want to only look at the codeBut the experience elf is built recursively and for all of these features. They will ask a series of questionsUntil it classifies it as raining or not raining so the real question is how do we build this thing?What is how do we build the optimal tree where it is asking the right?Threshold values like how does it know that the temperature should be?Greater than or less than 70 degrees Fahrenheit, and then based on that answerHow does it know whether the wind speed should be greater than 4.5 specifically where are these Magic numbers coming from well?they're coming fromThe Genie Index that's what it's calledIt's called a genie index genie in a bottle genie in a bottle not genie in a bottle ge is the Italian GenieSo it was some dude named Genie. I actually don't know we're not going to get into that anywayLet's talk about the genie index, so the Genie index is the loss function hereBut the difference here between what we've done before with gradient descent with Newton's method. Is that there is no convex optimization?Happening here. We're not trying to find the minimum of some convex functionThere is no error that we're trying to minimize the genie index is a cost functionThat works differently and here's how it works basically for every single point in our data set right we've got a bunch of different featuresWe want to find that idealThreshold value, and so I'm going to explain this once and then explain it again when we get into the codeBut we want to find that ideal feature all right what that ideal?Value for a feature okay, so what do I mean by that so here's how it works?Check this outWe've got a data set with a bunch of features right ten features, right?So one let's say let's say one of them is the income so the income could be anywhereAnd I'm going to use usd for this example. It could be anywhere from$10,000 a year to a million dollars a yearSo what the genie index is going to do is it's going to go through so what we're going to do is we're going toiterate through every single Data point for that featureSo we're going to say we're going to iterate through every single data point and we're going to compute this genie index which is oneMinus the sum of it is no accep. So the the genie index is where is the formula ohright?It is oneIt is one- the average times the average where the average is proportion right one - the average of all of the class valuesTimes that average and that gives us the gini index and so what we want to do is it comes out to some single valuesome scalar valueAnd so basically we use we start from we start from Data point zero and we go up to Data Point NWhere n is the number of Data points and we compute the index for each of these data points for specific feature?Right so let's say the first data point is 10,000 we'll compute a gini index for that data point for that value forThat amount of income and so what happens isbasically it goes on a scale so aGenius core of Zero is the worst case a genies core zero means that based on that index for all the other data pointsThey're going to be they're going to be evenly split between less than that value and greater than that valueBut that's not what we want ideally we want a gini score of 1 that is the ideal gini scoreAnd what that means is that for that given a value that given value for that specific feature?All the classes from 1 all the classes from all the data points from one class will be on one side of thatThreshold value and all the Data point from the other class will be on the other side of that threshold value that will give usa 1Value that is the a one gini value right and so what we do isWe just compute the gini index for every single Data point for every single featureAnd we just do that for every feature right away, so for let's say for you know income, we'll compute the gini indexSo we'll say okayso let's start with10000 we'll compare every other data point to 10000 and see whether it lies on the left or right whether it's greater or less thanAnd that we get we then we compute the gini index from that okay, which is this formula right here?And we do that for every single data point and so what's going to happen is we're going to have a collection of GDIndices we'll have a set of gini indicesAnd then we'll pick the one that that is the highest and the highest one is the one such that the data points are mostof our mostNot evenly split that means the most data points from class a are going to be on the left or right and the most dataPoint from the other class will be on the opposite side you see what I'm saying and by sideI mean greater than or less thansoSo the worst genie the worst case is when the data points are evenly split. We don't want thatWe want them to be all on one side and all on the other side that means that when we get a new data pointIt'll plop down right into itit's a little bucket with all the rest of its related data points right, so that's the genie index and there are different measures ofLoss when it comes toarithmetic based Machine learning models instead of MatrixMatrix operation based machine learning models like we see with Neural networksOkay, so that's a gini score gini index. Whatever you want to call it. So how do we build this decision tree?Well it is there are two parts we firstWe've got to construct the tree so we that's a recursive process that you'll seewhen to construct the tree and once we construct the tree then we prune the tree so that means we identify and remove theIrrelevant branches that might lead to outliers to increaseClassification accuracy, so wait a second you might be asking why are we building a random forest in the first place?Why can't we just build a decision tree alone? Well? What happens is if you just build a decision tree. That's not funNo, there's a better answer if you just build a decision tree thenYour decision tree could be over fit that is a big problem when it comes to the decision trees right the decision tree gets overfit to the Data right it's like it's like aThe boy someone might memorize an eye chart. It's not like they can see it properly right with one eye closedThey just memorize with the position of where everything isRight in that same way we don't want to over fit for Data to our dataSo the way to prevent that is to create a bunch of decision trees on random subsamples of the dataSo we'll define subset some set of sub samplesAnd we'll say for each of these sub samples will create a decision treeThen once we have a bunch of decision trees that we've trained right then by trainI mean, we've computed the genie in debt for all of the features and then we've recursively built the treeIt's a binary tree by the way. I didn't mention that decision Trees are binary trees, right?They're either a left node or right node or no node right? It's the leaf. It's the last node and soif you haven't reviewed binary TreesI mean, we're essentially building a binary tree right now, but if you want to learn more about data structures and algorithmsor if you're curiousIf you should know data structures and algorithms for machine learningThe answer is yes for two reasons one just to the logic sake you need to know how data is storedBecause machine learning isn't just about Matrix operations. It's also about storing Data, right?serializing and storing data in the most efficient way possible andRetrieving it and if you want to build algorithms, you got to have your basic data structure and algorithm knowledge intact, okay?I just wanted to say that back to thisthe way Random Forests work areEach of the Decision trees that are generated then we'll just once we have a new data pointWe'll run it through all of those decision trees. They'll all make a prediction, and then we'll make a majority voteSo it will calculate a majority vote so each of the votes for each of the treesWhatever is the majority voteWhich is the class that is the class that we're predicting and what this does is it gives us higher accuracyThan just using a decision tree aloneLet me also say that random forests are one of the most usedMachine learning techniques out there. They can be because they can be used for both classification and regressionand therein lies almost 90 what 90 plus percent of problems right andit also works well for very small data sets which we tend to have a lot ofandSo that's it's so random forests are just used a lotThey're you so much that how can I how can I how can I say this?They're you so much that the guy was in Josh, GordonThe Google dude his name on Twitter is random for it. So they are very useful hi, JoshIf you're watching this ok back to thisThe okay so we're training the we're training it on subsets of there right one subset per tree, and that is our random forestIt's a forest because it consists of trees as youBubbly desk, and so if you create a giant random forest you get lord of the rings Rivendell styleNo, you don't you the bigger the better generally you'll see at the end the more trees. We add the better our accuracy score getsRight so yeah and each of our nodes are going to represent a set of feature splits, right?So what's the color green red Green ok what's the size small big big ok that fruit is a watermelon, right?So we just recursively do that areThere other goodExamples of this you might be asking in the answer is yesOf course there are other good example stock price prediction and classificationI've got two great examples here. Definitely check them out the documentation is pretty Sparse, but the code itselfIt's not using any library so definitely check it out alright. So now let'sGo into the order of functions that we're going to follow. So we're not going to have time to write every single functionWe're not using any libraries, but we will write the two most important functions split and get splitAnd that's going to really take take on the majority of ourLogic but that's we're going to do and that going to be 40 lines of code but for the rest of itThis is the order of functions that I'm going to follow the chain of functionsSo let's just get right down into what this chain of functions will look like soI'm going to first of all look at our dependencies hereSo I'm going to import seed from Random and see it is going to generate pSeudo random numbersThis is useful for debugging you want to do it anytime you have some random numbersAnd you want to debug your code in production, or otherwise?it's always great to have some seed so that the random numbers that are generated start from the same point every time andSo that's just great for reproducibility of resultsI'm also going to import R and range so it's going to return a randomly selected element from a range of numbersCSV because our data set is by the way let's see our data set our dataSet is a CSV file is our data set is a CSV file. So let's open our data set and see what it looks likeit is theNumeric Data right here right so all of it is numeric at the end the result is either 2 or 1 right?It's a binary level either 2 or 1 and the rest are like 15 features hereWe're going to use every single one of them no feature selectionWe're going to use every single one of these features okay, and it's using arithmetic so that we're only importing mathWe're not even importing numpy. We're only importing the math libraryOkay, so let's let's look at this thingSo we've got some really basic Data Loader functions here Load CSVInitialize the data set as a list open it as a readable fileInitialize the CSV reader and then for every row in the data set appended to this data set MatrixIt's a 2d Matrix return it and so we have an in-memory version of our dataWe know that part that's that that's general to all machine learning really whenever you're reading a CSV fileWhat else we have here we have two more helper methods functions?one to convert a column to an int and want to convert one to convert a string to an int and one to convert aninto a string and that's if we haveString values so in this caseWe don't we have numerical values so we don't need this okaySo let's go into this order that I was talking about the order of algorithm, so the firstSo the first thing we want to look at is this main code here, right?So we started off with the seed so that we always start with the with the same random numbersWe loaded up our data set right that's CSV file, and then we converted our strings to integerswe don't actually need to do that butThen we said okaySo how many folds do we want to have and bold means subsamples of data so we want 5 sub samples?What is the max depth and the depth means how many nodes?What is the depth of the tree right how many levels of that tree?Do we want to create so we're going to say max 10 levels and these are our hyper parameters. We canTune them we can make them more or less, and we'll have different results. They're kind of like noteThe number of Neurons in A neural network, right?And so we say what's the minimum size? What's the minimum size for each of those nodes?How many features do we have we'll count all of those as well, and then what we're going to do is we're going to createThree different random forests all right we're going to create one random forest with just one treeSo it's actually a decision tree, and then one with five trees and then one with ten treesAnd then we'll assess and then we'll assess howGood each of these random ports are by measuring the number of trees the accuracy score for each of them, okay?That's what we're going to do. So notice that here is the big boy right here this evaluate algorithmIs that main function that we're going to use to train our model? So we're going to give evaluate algorithm our data setwe're going to give it theThe Random Forest model that we built the number of folds or the sub samples of the data?How big we want to tree to be the mid size?The sample size the number of Trees and the number of features that we've counted so let's look at what this evaluatesAlgorithm function looks like because that's really the big boy right we want to see what is going on in this main this big functionRight here, so what I'll do is. I will go right here, okay, soWhat it's doing is it's going to say this let's look at okay ready. Let's look at thisWhat it's going to do is it's going to say okay, so for a given data set and for a given algorithmWhich is the random forest algorithm that we're going to feed itLet's say the folds are the sub samples that are used to train and validate our modelsso we'll split our data into a training and a validation set I'llBy the number of folds, so what do I mean by that as well okay? Let's look at that cross-validation split methodWhere is that?It's right hereSo so we basically want to split the data into k fold the original sample is randomly partitioned into Kequal sized sub samples and then of those k sub samples a single sub sample is retained as theValidation Data and the rests are going to be used for training dataIt's splitting the data to that k minus 1 sub samples are used for training and then there's one sub sample that's left for validationThat's it okay. So back to this we talked about that functionSo once we split our data into those folds then we're going to say okay. Let's score each of them right because we're evaluating ourRandom Forest algorithms all say for each of the sub samples that we have of our dataLet's create a copy of the data and then remove the given sub sampleWhat initialize a test set because this algorithm really does two things it?It trains our model on the training data and then it tests it out on the testing datathat is it makes predictions on the testing data ignoring the labels, so we'll say okay for each, soWe'll add each row in a given subsample to the test set so that we have test samples as wellAnd then we'll get the predicted label right so for so we'll use a random forest algorithmThis is a random forest algorithm that that's the next thing we're going to look atbutIt's going to get all the predicted labels from them right from our training and our testing set and then it will get the actualLabels right so and once it has the predicted labelsAnd it has the actual labels then it can compare the twoVia this accuracy metric and the accuracy metric is a scalar valueAnd it is how we assess the validity of every random Force that we've built and soThe accuracy metric to go into this is really simple for each label if the actual label matches a predicted label addOne to the correct iteratorand then we just calculate the percentage of predictions that were correct which is the number of them divided by theNumber of correct divided by the actual one times 100 really simple like I said, it's all arithmeticIt's all plus minus plus minusMultiply and divide that's all this is there's there's no linear Algebra here. It's all AlgebraSo but despite how simple this model is it is quite powerful. Which is why it's awesomeOkay, so that's our evaluate algorithm function. So let's keep going down right we're going down the chain go into the moonMoonride Moonride MoonrideMoving my crazy in as well so back to thisIf you got that reference to pool if you didn't get that references from SpongeBob back to this you're so cool so back to thisSo we're evaluating our algorithm here. Let's evaluate this thing so if we're evaluating our algorithmSo what does this algorithm function even do what? What is this, right?Let's let's see what this algorithm function is what this algorithm function is is it is our?Random Forest that is what it isWe say for the range of trees that we have let's compute a subsample of those trees and then for that sub sampleWe'll build a specific decision tree for that specific sub sample, and then once we built that treeWe'll add it to our subtrees, and then we're going to make predictions based on all of those treesAnd we'll return the list of predictions right seems simple enough, right?So what is this bagging predict now notice that we just keep going down the chain, right?We just keep going down the chain, so this is the list of trees responsible for making a prediction for with each decision treeSo it combines the predictions from each decision treeand weselect the most common prediction the one that comes up the most that that label that the some of that label is greater than theRest of the some of the others right but and there are only two labels because this is binary classificationYou can also do multi-class classification, but we're not going to talk about that right now okay, so then we talked about bagging predictSo what's next subsample right? What is this the sub simple function? So how are we subsampling?How are we choosing how to split this data right? That's the question. How are we choosing how to do that? Well the answer isWe are we are creating a random sub sample, and this is where our randomness comes in right?We are creating a random sub sample in this random range for the number of samples in the dataAnd we'll add that list of sub-samples to this sample arrayAnd then return that so the sample array or listContains all of those samples from our from our data set right we split themAnd so that's four sub sampling and so now where were we so we talked about subsampling the building preparLet's let's see how the tree itself is built so if we give it some sub samplethe depth and size of the TreesIt's too high per parameters as well as a number of features we expect this function to build this treeSo let's look at how this how this function works. How is it building the tree itself well inside of this functionWe said we notice that it's it's first using this method get split which we're going to codeAnd which is where the meat of the code goes?But we're going to build a tree and that involves creating the root nodeand that's going to get that split and that this kind of this is going to output the root node right that first node andThen we'll call the split function that's going to call itself recursively to build out build out the whole treeSo once we've got that root node then we're going to call split recursivelyAnd so it's just going to continuously build that tree recursively by calling itself in case you haven't heard of recurredIt's when a function calls itself. It's like inceptionExcept it's recursion. Yeah, wow I never actually made that reference until I just said that rightinception isrecursion a dream inside a dreamWhoa, okay back to this um right right? So where were we?So we were at split and get split, so let's let's write out get splitRight so get split is the first one that we want to write outThis is going to select the best split point in a data set right that that key questionHow do we know when to split our data in this?Decision tree of the many decisions that we make in our random forest the answer is we'll have to compute itThis way this is an exhaustive and greedy algorithm. That means it is just going to go throughEvery single iteration that it can write there are no heuristics here. There's no educated guesses. There are no educated guessesIt's going to go through every single Data point to compute that split, so let's look at what this looks like wellwe've got to give it first of all our data sets of course andWe want to give it the number of features right because for each of those features, we're going to compute that splitSo given the data set we've got to check every value on each attribute as a candidate splitSo we're going to what we're going to do is we want to sayLet's get all of those class values right and that the set that set of class values is going to be a listThat set of class saw is going to be a list and it's going to be a list for every single Data pointAnd so we'll say all of the rows from our data set are our data points right? We have that we know thatwe want to calculate the index the value the score and thegroups and soWe'll initialize all of these has really big numbers, and they will be updated with timebut we'll initialize them as very big numbers as well, okay, so check this out, sothe gini Index essentially gives us two things it gives us an index, and it gives us a value the index of the feature ofThe feature of the so it gives us it gives us the indexsome featureWhose value is the Optimal value to split the data on for that feature you?See what I'm saying it gives us the index of say for income let's say that30,000 is the best feature it is going to be that decision node from then we can put everything based on30,000 that is where the the classes are most split?It's going to give us the index of 30,000 in the in the dataset as well as the value whatever it is30,000 right so that's the pair that the gini index gives us theIndex and the value andIt will also give us a score and the groups and the groups are the sub samples and the score isThe is it's how good it is right. It's a measure of how good it is soWe all want to initialize our features here as a list and then we're going to say okayso while thenumber features is less thanthe number of Features whereas there's going to be 0 and we're going to increase it every time as we iterate through each of thefeatures well let's this decide some random range right some some random index in our data set tothenTo append to our features list so say if the index is not in the featuresWhich it won't beThen a pent at firstbut eventually it will well a pendant will append the index wherever we are to the list of features that we initialize is empty andthenOnce we've done that we'll sayfor every index in the data setFor each of those indices let's let's go through every single row in the data set so we're computing groups here, right?So we're computing groups to split our data into so we're saying the same what we want todecide the the test values that we want to split as well as thethe gini Index and so the gini Index isWhat we're computing right hereThis is the point that we're computing our gini index for the current group of data that we're in rightSo for 4 we picked a feature and we're going to get through everywherefor that feature we're going to compute the gini index for all those values andWe're going to pick the gini index that is d. That is thelargest rightAnd that is what we're going for and that once we picked the gini index at its largestthat will give us the index and value to then build out that that note of thedecision tree, okay, so thenWe computed that and now we're going to say okay if the gini index is less than the optimal scorethen we want to say we're going toUpdate these values to the new values to the score the value of the index and the group'sSo we're going to use a dictionary to store the node in the decision treeBy its name, so we'll say return or were we so we'll say return the indexas well as the value the value andthe Group'sbe groupsRight so with the index the value in the group's right because we've computed all those rightSo this that function gives us gives us the root node right and so once we have the root node then we can actuallyPerform the splitting write it down the best splitting point and so now we want to recursively compute the splitting itselfSo that's where our split function comes in given that root nodeHow do we build a tree such that it is split along the ideal lines, okay?So let's let's let's write out this split function, okay?So it's going to so basically so this is the binary tree part if you've if you've created a binary tree before it is exactlythe same so given someRoot node right. We're going to say some root nodeWill compute a left and right leaf for that node, and then we'll delete the original node so today, okayso now we can delete that andthenOnce we've done thatWe can check if either the left or right group of note is emptyso we're checking if either the left note, or the right note is empty, and if so then we create a terminal node using therecords that we do have here right soin a terminal node by the wayLet's look at what a terminal note is we select a class value for a group of rows and then return the mostcommon output value in that list of RowsWhat is the most common output value in that list of rows and that is the most common class so that's what we're doingWe're select the most common classOkay, so that's the first part and so we want to check if we've reached our maximum depth right so that that depth is ourHyper parameter is a threshold for how large we want our trait to beSo we'll check if we've if we've reached that point so we'll sayIf the depth is greater than or equal to the max depth thenSo if we reached our max depth then we create a terminal nodeThat's what that's what that's saying okay? So then we've got two more parts here. All right, so the next part is to sayokay, so first the two groups of data that are split by the node we retrieved them when we store them in the left andright variables hereAnd then we delete that nodethen we check if either the left or the right group of rows is empty and if they are we create aTerminal node using the records that we already have right here and so the terminal node by the way isWhere we just select the class value? That's the most-used right?That is the that is the output class the output class the terminal node. Is that what is that?What is the what is the prediction itself right? That's the end point and so then we check itSo then we check it either the left or right group of Rows is empty and if so we create that terminal RoadSo then we check if we've reached our maximum depths and if so, we create a terminal nodeAnd so that's what this part is and then and so lastly if the group of rows is too small will create a terminal nodeelseWe'll add the left node in a depth first fashion until the bottom of the tree is reached on this branch will do thisSo we'll do the same for the right child as wellAnd so the right side is then processed in the same wayAnd then as we ride back up to construct a tree all the way back up to the root, okay?So get splits notice how gets split is being called here over and over againtwo more functions and then and then we're good with this the two more functionsI had where the gini index and so the gini index is likeI said it is it was that formula right up here, okay?This is the gini index or gini score. Whatever you want to call itso the gini Index splits the data setInvolving one input feature and one value for that feature write what the gini index gives us is remember that pair that that the value?And the index of some feature some features for some data points right and that's that's the line that. We then split. Well. That's theboundary from which we can split data based on that feature in the future and soThe way we compute that is it starts off at zero?It's some scalar value and we're computing it for all of the data pointsSo for each class value that we have for all of our classes, and we only have twofraudulent or not fraudulentAnd we only have two credit worthy or not credit worthy for each of those classesWe'll select a random subset of that class we'll compute the average value for that featureAnd then we'll compute p times 1 minus pWhere p is the average and that is our gini scalar okay?And we'll add them all up together because we have all of those because it's the sum of all of those valuesAnd that's where the sigma notation comes in and we'll return that as a gini score, okay?we compute that for all of the subsets of our data andSo the last function to show you is the predict function and the predict function is right here, right?So whenever we're actually making predictions. This is how it works it navigates down the treeThis is it's asking is this person employed or not with this person go to school?What is this person's social security number?What is this person's you know just a bunch of random questions based on the features each of the features that we haveSo predict is recursive so whereas the node is always changing for a given rowThe node could be the left node or the right nodeSo whether or not the value for some Data point is the than work greater than some nodes?Threshold value that we've computed using the gini indexit will then update the node and then use that as a new parameter to then run predict again andEventually once it's reach the terminal node the last node the labelIt will return the label and that and we and thenBecause and that's for one decision tree and because we have a random forest. It's computing that for every single decision treeWe sum up the values and we use the one that is the majority vote and that is our predictionso then if we test our code will notice that theWe've got ouraccuracy scores hereAnd so the accuracy is getting is improving every time so we've tried it for three different random forests we we triedWe tried it for one with one decision treeWe tried it for one with five decision treesand we try to it for one with ten decision Trees andEvery time the accuracy accuracy score improved and so what this means is if we give it a one hundredOne hundred Tree Random forest or a thousand Tree forest. It's going to do really really well okay, soAnd then we'll be able to predict whether or not someone'sSomeone is worthy of getting their credit assessed or not and if you made it to the end of thisI'm very happy so thank you, and that's all please subscribe for more programming videos and for nowI've got to do something random. So thanks for watching\n"

Random Forests - The Math of Intelligence (Week 6)

Random Videos