Decision Tree pruning using Cost complexity [Student Query]

Cost Complexity Pruning of Decision Trees

=====================================

Cost complexity pruning is a technique used to improve the performance of decision trees by reducing overfitting. In this article, we will explore how cost complexity pruning works and how it is implemented.

Error Metric

-------------

The error metric used for cost complexity pruning can be any suitable error measure such as mean squared error, log loss, or other error metrics. The choice of error metric depends on the specific problem being solved and the characteristics of the data.

Tree T and Data Set S

----------------------

Given a tree T, we need to define what err TS represents. Err TS stands for the error of tree T on data set S. This is the measure of how well tree T performs on data set S.

Leaves T and Pruning

--------------------

A leaf node in a tree represents the terminal point where a prediction is made. The leaves T represent all the leaf nodes in a tree T. When pruning a subtree, we remove the nodes from the tree to reduce its size. This can be done by removing individual nodes or entire branches.

Prune T and Pruned Tree

----------------------

The prune t operation takes a subtree and returns the resulting tree after pruning. This means that if we have a subtree t1, pruned t will result in the same tree as t2 but with fewer nodes removed.

Complexity of Pruning

---------------------

When pruning a tree, we need to choose between two subtrees: t1 or t2. The complexity of pruning is determined by finding the subtree that minimizes the expression:

(error on prune t) - (error on original tree)

This expression represents the difference in error between the pruned tree and the original tree.

Choosing Between Subtrees

-------------------------

To choose between t1 and t2, we need to find the subtree that results in a smaller value for the above expression. This is done by plugging in both trees into the expression and comparing the values.

Minimizing the Expression

-------------------------

The goal of cost complexity pruning is to minimize this expression by choosing the tree that results in fewer leaves while maintaining similar error performance as the original fully grown tree.

Pruning Process

----------------

To prune a tree, we iterate through all possible subtrees and evaluate their error performance using the above expression. We then choose the subtree with the smallest value for the expression and apply the pruning operation to obtain the pruned tree.

Example

-------

Suppose we have two subtrees: t1 and t2. We need to prune one of them. We plug in both trees into the expression:

(error on prune t) - (error on original tree)

We compare the values obtained from plugging in each subtree and choose the one with the smallest value.

Mathematical Representation

---------------------------

The mathematical representation of cost complexity pruning can be found on Wikipedia. It is a well-formulated equation that describes how to calculate the optimal tree by minimizing the expression:

(error on prune t) - (error on original tree)

This equation provides a clear understanding of how to implement cost complexity pruning in practice.

Conclusion

----------

Cost complexity pruning is an effective technique for improving the performance of decision trees. By reducing overfitting, this technique ensures that the resulting tree generalizes well to new data. The choice of error metric and pruning strategy can significantly impact the performance of the resulting tree. Understanding how to implement cost complexity pruning can help improve the accuracy of decision trees in a variety of applications.

"WEBVTTKind: captionsLanguage: enuh hi uh i believe your question was about when you're computing cost complexity pruning of decision trees like imagine you have a decision tree like this and typically pruning happens bottom up right because you're most likely over fitting because of this in the cost complexity pruning you're trying to reduce the number of leaf nodes right so imagine you have a setup like this again i'm just taking this setup so that it's easier to explain so let's assume there is this tree okay let's call this tree as t1 which is rooted here there is this other tree which is t2 right now if you again your question from what i understood from the email is which of these two trees would you prune because by pruning this tree you can reduce or by basically removing these two leaf nodes you are reducing the total number of leaf nodes by one now given this larger tree t okay let's assume this tree that is rooted here this whole tree let's say is capital t this small sub tree let's call it t1 this small sub tree is t2 now the question here is imagine as part of cost complexity pruning you're trying to reduce the number of leaf nodes okay because you have this alpha hyper parameter that you tune now imagine which of these two will you actually now remove now i'm going to follow the notation and equations from wikipedia because it's one of the simplest explanations that i've seen okay so let's define a bunch of terms first let's say error p comma s e r r t s basically is again this could be a regression tree classification tree whatever error metric you use it is that error if you are using total tree t on data set s right your error metric could be anything mean squared error your log loss whatever error metric you have right so err ts basically stands for it is the error of tree t on data set s right whatever tree you have again typically this cost complex tree protein is used on regression trees okay that's one of the major places where it's used now okay let's assume leaves t is set of all the leaf nodes in a tree t let's just say similarly there is another operation called prune t comma t so what this does is so in this case let's assume we have prune let's assume we have pruned t comma t1 this means it is the tree that that will result if you prune this subtree right if you just if you if you remove this and if you remove this right by pruning this subtree what you arrive at is again this results another tree okay so prune t comma t is the tree resulting that that will result in after pruning subtree t from the larger tree capital t so let's use this notation now it's very simple cos complexity pruning again if when you have to choose between t1 and t2 it's actually very simple okay so the equation from wikipedia i've just copied it as is i'll explain this and break it for you so the objective here is in this case what do we have we have two subtrees because often pruning happens bottom up so you can either prune t1 or t2 so what you do is find the sub tree t here this subtree could be t1 or t2 find that subtree that minimizes this expression now let's see what this expression is trying to do so you want to you want to find look at this you want to find that sub tree t which minimizes this expression now what is this expression saying look at this so this is basically this first part here this is the error that will that you will get after proving tree t look at this this is prune t comma t so let's let's take our example right we have to choose between t1 and t2 so first you will plug in t1 so you'll plug in t equals to t1 first then you will plug in t equals to t2 and you will compute the matrix and whichever has the lower value you will pick that now if you think logically what is this this is again prune t comma t right so let's assume t equals to t 1 right then what do you get you get this subtree you get this subtree after pruning this node and after removing these two nodes whatever tree you're left with that's what you will get right so compute the error on the pruned tree minus the error on the original tree on the same data set now remember the pruned tree has larger error right remember the prune tea typically has larger error because the non-pruned tree fully grown tree has lower error because you could have over fit on this data set s right so this is larger value this is smaller value so the numerator is always positive or zero sometimes okay next in the denominator look at this the number of leaves that you have in tree t is larger than the number of leaves that you have in the tree that you obtain after pruning tea so what what are you trying to do here you are saying i want to pick that subtree either t1 or t2 that will result in a good reduction of error look at this you're trying to minimize this right look at this you want you you want your pruned tree to be as good as the original fully grown tree right so that's why you are minimizing the numerator but you are trying to because this is the numerator you are maximizing it which means you will find that tree which will result in fewer leaves but same error as the original full tree look at this because you're minimizing this difference here right you're minimizing this difference that means you're trying to find that you're trying to find between this t1 and t2 you're trying to find that tree to prune whose error is as close as possible to the original fully grown tree while you're trying to somehow minimize the number of leaves that you have right so this is how again this is one example at any point if you have if you have subtrees like this again often times cost complexity pruning happens bottom up and you can have all these subtrees and you can see which subtree results in the minimum value of that that subtree you prune as you keep increasing your alpha right this is how cos cost complexity pruning of decision trees this is how you choose which subtree to prune it's a very nice simple metric again this thing is uh this link is available on wikipedia also so you can just go to wikipedia and just type cost complexity pruning of decision trees wikipedia you'll get this equation as this okay so i've taken the wikipedia link so that you can go and read it from wikipedia also i hope this clarifies your questionuh hi uh i believe your question was about when you're computing cost complexity pruning of decision trees like imagine you have a decision tree like this and typically pruning happens bottom up right because you're most likely over fitting because of this in the cost complexity pruning you're trying to reduce the number of leaf nodes right so imagine you have a setup like this again i'm just taking this setup so that it's easier to explain so let's assume there is this tree okay let's call this tree as t1 which is rooted here there is this other tree which is t2 right now if you again your question from what i understood from the email is which of these two trees would you prune because by pruning this tree you can reduce or by basically removing these two leaf nodes you are reducing the total number of leaf nodes by one now given this larger tree t okay let's assume this tree that is rooted here this whole tree let's say is capital t this small sub tree let's call it t1 this small sub tree is t2 now the question here is imagine as part of cost complexity pruning you're trying to reduce the number of leaf nodes okay because you have this alpha hyper parameter that you tune now imagine which of these two will you actually now remove now i'm going to follow the notation and equations from wikipedia because it's one of the simplest explanations that i've seen okay so let's define a bunch of terms first let's say error p comma s e r r t s basically is again this could be a regression tree classification tree whatever error metric you use it is that error if you are using total tree t on data set s right your error metric could be anything mean squared error your log loss whatever error metric you have right so err ts basically stands for it is the error of tree t on data set s right whatever tree you have again typically this cost complex tree protein is used on regression trees okay that's one of the major places where it's used now okay let's assume leaves t is set of all the leaf nodes in a tree t let's just say similarly there is another operation called prune t comma t so what this does is so in this case let's assume we have prune let's assume we have pruned t comma t1 this means it is the tree that that will result if you prune this subtree right if you just if you if you remove this and if you remove this right by pruning this subtree what you arrive at is again this results another tree okay so prune t comma t is the tree resulting that that will result in after pruning subtree t from the larger tree capital t so let's use this notation now it's very simple cos complexity pruning again if when you have to choose between t1 and t2 it's actually very simple okay so the equation from wikipedia i've just copied it as is i'll explain this and break it for you so the objective here is in this case what do we have we have two subtrees because often pruning happens bottom up so you can either prune t1 or t2 so what you do is find the sub tree t here this subtree could be t1 or t2 find that subtree that minimizes this expression now let's see what this expression is trying to do so you want to you want to find look at this you want to find that sub tree t which minimizes this expression now what is this expression saying look at this so this is basically this first part here this is the error that will that you will get after proving tree t look at this this is prune t comma t so let's let's take our example right we have to choose between t1 and t2 so first you will plug in t1 so you'll plug in t equals to t1 first then you will plug in t equals to t2 and you will compute the matrix and whichever has the lower value you will pick that now if you think logically what is this this is again prune t comma t right so let's assume t equals to t 1 right then what do you get you get this subtree you get this subtree after pruning this node and after removing these two nodes whatever tree you're left with that's what you will get right so compute the error on the pruned tree minus the error on the original tree on the same data set now remember the pruned tree has larger error right remember the prune tea typically has larger error because the non-pruned tree fully grown tree has lower error because you could have over fit on this data set s right so this is larger value this is smaller value so the numerator is always positive or zero sometimes okay next in the denominator look at this the number of leaves that you have in tree t is larger than the number of leaves that you have in the tree that you obtain after pruning tea so what what are you trying to do here you are saying i want to pick that subtree either t1 or t2 that will result in a good reduction of error look at this you're trying to minimize this right look at this you want you you want your pruned tree to be as good as the original fully grown tree right so that's why you are minimizing the numerator but you are trying to because this is the numerator you are maximizing it which means you will find that tree which will result in fewer leaves but same error as the original full tree look at this because you're minimizing this difference here right you're minimizing this difference that means you're trying to find that you're trying to find between this t1 and t2 you're trying to find that tree to prune whose error is as close as possible to the original fully grown tree while you're trying to somehow minimize the number of leaves that you have right so this is how again this is one example at any point if you have if you have subtrees like this again often times cost complexity pruning happens bottom up and you can have all these subtrees and you can see which subtree results in the minimum value of that that subtree you prune as you keep increasing your alpha right this is how cos cost complexity pruning of decision trees this is how you choose which subtree to prune it's a very nice simple metric again this thing is uh this link is available on wikipedia also so you can just go to wikipedia and just type cost complexity pruning of decision trees wikipedia you'll get this equation as this okay so i've taken the wikipedia link so that you can go and read it from wikipedia also i hope this clarifies your questionuh hi uh i believe your question was about when you're computing cost complexity pruning of decision trees like imagine you have a decision tree like this and typically pruning happens bottom up right because you're most likely over fitting because of this in the cost complexity pruning you're trying to reduce the number of leaf nodes right so imagine you have a setup like this again i'm just taking this setup so that it's easier to explain so let's assume there is this tree okay let's call this tree as t1 which is rooted here there is this other tree which is t2 right now if you again your question from what i understood from the email is which of these two trees would you prune because by pruning this tree you can reduce or by basically removing these two leaf nodes you are reducing the total number of leaf nodes by one now given this larger tree t okay let's assume this tree that is rooted here this whole tree let's say is capital t this small sub tree let's call it t1 this small sub tree is t2 now the question here is imagine as part of cost complexity pruning you're trying to reduce the number of leaf nodes okay because you have this alpha hyper parameter that you tune now imagine which of these two will you actually now remove now i'm going to follow the notation and equations from wikipedia because it's one of the simplest explanations that i've seen okay so let's define a bunch of terms first let's say error p comma s e r r t s basically is again this could be a regression tree classification tree whatever error metric you use it is that error if you are using total tree t on data set s right your error metric could be anything mean squared error your log loss whatever error metric you have right so err ts basically stands for it is the error of tree t on data set s right whatever tree you have again typically this cost complex tree protein is used on regression trees okay that's one of the major places where it's used now okay let's assume leaves t is set of all the leaf nodes in a tree t let's just say similarly there is another operation called prune t comma t so what this does is so in this case let's assume we have prune let's assume we have pruned t comma t1 this means it is the tree that that will result if you prune this subtree right if you just if you if you remove this and if you remove this right by pruning this subtree what you arrive at is again this results another tree okay so prune t comma t is the tree resulting that that will result in after pruning subtree t from the larger tree capital t so let's use this notation now it's very simple cos complexity pruning again if when you have to choose between t1 and t2 it's actually very simple okay so the equation from wikipedia i've just copied it as is i'll explain this and break it for you so the objective here is in this case what do we have we have two subtrees because often pruning happens bottom up so you can either prune t1 or t2 so what you do is find the sub tree t here this subtree could be t1 or t2 find that subtree that minimizes this expression now let's see what this expression is trying to do so you want to you want to find look at this you want to find that sub tree t which minimizes this expression now what is this expression saying look at this so this is basically this first part here this is the error that will that you will get after proving tree t look at this this is prune t comma t so let's let's take our example right we have to choose between t1 and t2 so first you will plug in t1 so you'll plug in t equals to t1 first then you will plug in t equals to t2 and you will compute the matrix and whichever has the lower value you will pick that now if you think logically what is this this is again prune t comma t right so let's assume t equals to t 1 right then what do you get you get this subtree you get this subtree after pruning this node and after removing these two nodes whatever tree you're left with that's what you will get right so compute the error on the pruned tree minus the error on the original tree on the same data set now remember the pruned tree has larger error right remember the prune tea typically has larger error because the non-pruned tree fully grown tree has lower error because you could have over fit on this data set s right so this is larger value this is smaller value so the numerator is always positive or zero sometimes okay next in the denominator look at this the number of leaves that you have in tree t is larger than the number of leaves that you have in the tree that you obtain after pruning tea so what what are you trying to do here you are saying i want to pick that subtree either t1 or t2 that will result in a good reduction of error look at this you're trying to minimize this right look at this you want you you want your pruned tree to be as good as the original fully grown tree right so that's why you are minimizing the numerator but you are trying to because this is the numerator you are maximizing it which means you will find that tree which will result in fewer leaves but same error as the original full tree look at this because you're minimizing this difference here right you're minimizing this difference that means you're trying to find that you're trying to find between this t1 and t2 you're trying to find that tree to prune whose error is as close as possible to the original fully grown tree while you're trying to somehow minimize the number of leaves that you have right so this is how again this is one example at any point if you have if you have subtrees like this again often times cost complexity pruning happens bottom up and you can have all these subtrees and you can see which subtree results in the minimum value of that that subtree you prune as you keep increasing your alpha right this is how cos cost complexity pruning of decision trees this is how you choose which subtree to prune it's a very nice simple metric again this thing is uh this link is available on wikipedia also so you can just go to wikipedia and just type cost complexity pruning of decision trees wikipedia you'll get this equation as this okay so i've taken the wikipedia link so that you can go and read it from wikipedia also i hope this clarifies your question\n"