Welcome to Machine Learning with Tree-Based Models
Machine learning is the subfield of artificial intelligence that involves training a model using input data and corresponding labels. In contrast, unsupervised learning involves learning from input data without any corresponding labels. Supervised learning is further divided into two types: classification and regression.
In supervised learning, each example is a pair consisting of input data and an output value. This output value can represent a category or label in the case of classification or a numeric value in the case of regression. A supervised learning algorithm analyzes the training data and produces an inferred function or model that can be used to map new examples to predict labels or values.
The concept of supervised learning is analogous to a student learning a subject by studying a set of questions and their corresponding answers. After mastering the mapping between the questions and the answers, the student can then provide answers to new, never-before-seen questions on the same topic. In this course, we will delve into decision tree-based models, including three-based and sample models such as random forests and gradient boosting machines.
Three-based models stand out from other types of machine learning models due to their unique combination of model interpretability, ease of use, and excellent accuracy when used in ensemble. Tree-based methods are simple and useful for model interpretation, making them suitable for anyone who is comfortable reading a flow chart or has the skill sets to understand a decision tree. Decision trees have been widely used by both data scientists and non-technical stakeholders, such as managers and decision makers, to help make decisions.
In this course, we will learn the principles of tree-based machine learning models and how to use them effectively. You will learn how to interpret and explain decisions made from a tree-based model, explore different use cases, build and evaluate classification and regression models, and tune model parameters for optimal performance. We will cover several tree-based models, including classification and regression trees, bagged trees, random forests, and finally, boosted trees in particular, the gradient boosting machine (GBM).
A decision tree is a hierarchical structure with nodes and directed edges. The top node is called the root node, while the bottom nodes are referred to as leaf nodes or terminal nodes. Internal nodes that are neither the root nodes nor leaf nodes are called internal nodes. These internal nodes have binary test conditions associated with them, and each leaf node has an associated class label.
One of the most popular packages for decision trees in R is the caret package, which will be covered in the first two chapters of this course. You will learn how to use this package to train both classification and regression trees using a recursive partitioning process that underlies the training of a decision tree model. To get familiar with what's inside the caret package, you can take a look at the help page or use the function `caret` from the package to train a decision tree model.
In the following sections, we will go into more detail about each argument and their meanings later in the course. For now, let's start right away with an example.
"WEBVTTKind: captionsLanguage: enwelcome to machine learning with tree based models in our I'm Erin Liddell and I'm a machine learning scientists and co-author of several our packages including the h2o package for machine learning i am gabriela de carros and i'm and data scientists and the founder of our ladies a worldwide organization for promoting diversity in the our community so provides learning is the subfield of machine learning in which you train a model using input data and corresponding labels the converse is called unsupervised learning where you learn from the input data along in supervised learning each example is a pair consisting of the input data and an output value which represents a category or label in the case of classification or numeric value in the case of regression a supervised learning algorithm analyzes the training data and produces an inferred function or a model which can be used for mapping new examples to predict labels or values as an analogy you can compare supervised learning to a student learning a subject by studying a set of questions and their corresponding answers after mastering the mapping between the questions and the answers the student can then provide answers to new never-before-seen questions on the same topic in this course we'll talk about decision three based models including three based and sample models such as random forests and gradient boosting machines or gbms three based models stand out from other types of machine learning models due to their unique combination of model interpretability ease of use and when used in assembles excellent accuracy tree based methods are simple and useful for model interpretation they're used to make decisions explore data and make predictions decision trees and naturally is to interpret anyone who is comfortable reading a flow chart a read has the skill sets to understand a decision tree trees I used not only by data scientists but also used by managers and decision makers for example to help them make decisions in this course we'll learn the principles of tree based machine learning models and how to use them you'll learn how to interpret and explain decisions made from a tree base model explore different use cases build and evaluate classification and regression models tune model parameters for optimal performance we'll cover several tree based models we'll talk about and explain classification and regression trees bagged trees random forests and lastly you'll learn about boosted trees in particular the gradient boosting machine or GBM one of the most widely used and powerful algorithms that's available today a decision tree is a hierarchical structure with nodes and directed edges that node at the top is called the root node the notes at the bottom are called the leaf nodes or terminal nodes notes that are neither the root nodes or the relief nodes are called internal nodes the root and the internal nodes have binary test conditions associated with them and each leaf node has an Associated class label one of the most popular packages for decision trees in R is the R part packaged in the first two chapters of this course you'll learn how to use this package for training both classification and regression trees our part is short for recursive partitioning which is a process used in the training of a decision tree model if you want to get familiarized with what is inside the art parts package you can take a look at the help page in you can use the function are part from the are part package to train a decision tree model we'll go into more detail about what each of these arguments means later in the course but for now you can see here the basic syntax this course is very handsome so let's start right away with an examplewelcome to machine learning with tree based models in our I'm Erin Liddell and I'm a machine learning scientists and co-author of several our packages including the h2o package for machine learning i am gabriela de carros and i'm and data scientists and the founder of our ladies a worldwide organization for promoting diversity in the our community so provides learning is the subfield of machine learning in which you train a model using input data and corresponding labels the converse is called unsupervised learning where you learn from the input data along in supervised learning each example is a pair consisting of the input data and an output value which represents a category or label in the case of classification or numeric value in the case of regression a supervised learning algorithm analyzes the training data and produces an inferred function or a model which can be used for mapping new examples to predict labels or values as an analogy you can compare supervised learning to a student learning a subject by studying a set of questions and their corresponding answers after mastering the mapping between the questions and the answers the student can then provide answers to new never-before-seen questions on the same topic in this course we'll talk about decision three based models including three based and sample models such as random forests and gradient boosting machines or gbms three based models stand out from other types of machine learning models due to their unique combination of model interpretability ease of use and when used in assembles excellent accuracy tree based methods are simple and useful for model interpretation they're used to make decisions explore data and make predictions decision trees and naturally is to interpret anyone who is comfortable reading a flow chart a read has the skill sets to understand a decision tree trees I used not only by data scientists but also used by managers and decision makers for example to help them make decisions in this course we'll learn the principles of tree based machine learning models and how to use them you'll learn how to interpret and explain decisions made from a tree base model explore different use cases build and evaluate classification and regression models tune model parameters for optimal performance we'll cover several tree based models we'll talk about and explain classification and regression trees bagged trees random forests and lastly you'll learn about boosted trees in particular the gradient boosting machine or GBM one of the most widely used and powerful algorithms that's available today a decision tree is a hierarchical structure with nodes and directed edges that node at the top is called the root node the notes at the bottom are called the leaf nodes or terminal nodes notes that are neither the root nodes or the relief nodes are called internal nodes the root and the internal nodes have binary test conditions associated with them and each leaf node has an Associated class label one of the most popular packages for decision trees in R is the R part packaged in the first two chapters of this course you'll learn how to use this package for training both classification and regression trees our part is short for recursive partitioning which is a process used in the training of a decision tree model if you want to get familiarized with what is inside the art parts package you can take a look at the help page in you can use the function are part from the are part package to train a decision tree model we'll go into more detail about what each of these arguments means later in the course but for now you can see here the basic syntax this course is very handsome so let's start right away with an example\n"