R Tutorial - Fitting and interpreting a choice model

Fitting Choice Models: A Similar Process to Regression Models

Now that we've inspected the data, we're ready to fit a choice model. The process is very similar to fitting a regression model. To fit a linear regression model, we use the function LM. When we type this command, we're telling our software to fit a model to predict Y as a function of x1, x2, and x3 using the data in the mydata data frame. If the LM data doesn't include columns named y, x1, x2, and x3, you'll get an error. We usually take the output of LM and assign it to a model object that we can use later. Here, we're assigning it to our mymodel object. Once we have the mymodel object, we can see a summary of the model by typing summary(mymodel).

The process for fitting a choice model is very similar to fitting a linear regression model, except that we use a different function called M logit. Multinomial logit models are somewhat specialized and cannot be estimated with LM or even with the GLM function that you may have used before. Instead, we use the M logit function from the M logit package. Just like LM, there are two key inputs to M logit: a formula and the name of the data frame where the data is stored. The data input is pretty straightforward but the data has to be choice data, meaning it has to have a column that indicates which choice observation each alternative belongs to. This is referred to as the quest column. It also has to have a column of zeros and ones indicating which option was chosen, labeled as choice.

The formula that we use should always begin with the name of the column that indicates the choice because we want to predict the choice. Then, we type a tilde and after the tilde, we list the names of the product features we want to use to predict the choice, just like LM. We also indicate which data frame we want to use to fit the model under the hood. The model that we fit with M logit is different than the linear model that we fit with LM for now, we're going to skip over the details of how they are different but we'll come back to that in chapter 3.

When we asked for a summary of the EM logic model object, we get output that looks a lot like what you would get from a regression. The most important part of the summary output is the table of coefficients. We'll go into more detail on all of the output but for now, let's focus on the column labeled estimate. The numbers in this column represent the relative value that customers place on each feature. For example, the coefficient for feature 3 is negative 1.29, which means that people prefer the high level of feature 3 to the low level. Just like with linear regression, the stars all the way to the right-hand side indicate which features have a statistically significant effect on choice. We'll go into more detail on how to interpret these parameters in chapter 3 but for now, just keep in mind that parameters that are more than 1 or less than minus 1 indicate a very strong preference for a feature.

The closer the coefficient is to 0, the weaker the preference. Next, let's find out how people value features of sports cars by fitting a choice model to the sports car data.

Fitting Choice Models to Sports Car Data

Let's fit a choice model to the sports car data to see how people value different features. We'll use the M logit function from the M logit package to estimate the preferences. The data input is pretty straightforward but the data has to be choice data, meaning it has to have a column that indicates which choice observation each alternative belongs to.

The formula that we use should always begin with the name of the column that indicates the choice because we want to predict the choice. Then, we type a tilde and after the tilde, we list the names of the product features we want to use to predict the choice. We also indicate which data frame we want to use to fit the model under the hood. The model that we fit with M logit is different than the linear model that we fit with LM for now.

When we asked for a summary of the EM logic model object, we get output that looks a lot like what you would get from a regression. The most important part of the summary output is the table of coefficients. We'll go into more detail on all of the output but for now, let's focus on the column labeled estimate.

The numbers in this column represent the relative value that customers place on each feature. For example, the coefficient for feature 3 is negative 1.29, which means that people prefer the high level of feature 3 to the low level. Just like with linear regression, the stars all the way to the right-hand side indicate which features have a statistically significant effect on choice.

We'll go into more detail on how to interpret these parameters in chapter 3 but for now, just keep in mind that parameters that are more than 1 or less than minus 1 indicate a very strong preference for a feature. The closer the coefficient is to 0, the weaker the preference.

"WEBVTTKind: captionsLanguage: ennow that we've inspected the data we're ready to fit a choice model the process is very similar to fitting a regression model so let's start with a quick refresher on that to fit a linear regression model we use the function LM when we type this command we're telling our to fit a model to predict Y as a function of x1 x2 and x3 using the data in the my data data frame if LM data doesn't include columns named y x1 x2 and x3 you'll get an error we usually take the output of LM and assign it to a model object that we can use later here we're assigning it to my model once we have the my model object we can see a summary of the model by typing summary of my model the process for fitting a choice model is very similar to fitting a linear regression model except that we use a different function called M logit multinomial logit models are somewhat specialized so you can't estimate them with LM or even with the GLM function that you may have used before instead we use the M logit function from the M logit package just as with L M there are two key inputs to M legit a formula and the name of the data frame where the data is stored the data input is pretty straightforward but the data has to be choice data that means it has to have a column that indicates which choice observation each alternative belongs to here that is the quest column it also has to have a column of zeros and ones indicating which option was chosen and here that is labeled choice the formula that we use should always begin with the name of the column that indicates the choice because we want to predict the choice then we type a tilde and after the tilde we list the names of the product features we want to use to predict the choice just like L M we also indicate which data frame we want to use to fit the model under the hood the model that we fit with M logit is different than the linear model that we fit with L M for right now we're going to skip over the details of how there different but we'll come back to that in chapter 3 when we asked for a summary of the EM logic model object we get output that looks a lot like what you would get from a regression the most important part of the summary output is the table of coefficients we'll go into more detail on all of the output but for now let's focus on the column labeled estimate the numbers in this column represent the relative value that customers place on each feature for example the coefficient for feature 3 lo is negative 1 point 2 9 which means that people prefer the high level of feature 3 to the low level just like with linear regression the stars all the way to the right hand side indicate which features have a statistically significant effect on choice we'll go into more detail on how to interpret these parameters in chapter 3 but for now just keep in mind that parameters that are more than 1 or less than minus 1 indicate a very strong preference for a feature The Closer the coefficient is to 0 the weaker the preference next let's find out how people value features of sports cars by fitting a choice model to the sports car datanow that we've inspected the data we're ready to fit a choice model the process is very similar to fitting a regression model so let's start with a quick refresher on that to fit a linear regression model we use the function LM when we type this command we're telling our to fit a model to predict Y as a function of x1 x2 and x3 using the data in the my data data frame if LM data doesn't include columns named y x1 x2 and x3 you'll get an error we usually take the output of LM and assign it to a model object that we can use later here we're assigning it to my model once we have the my model object we can see a summary of the model by typing summary of my model the process for fitting a choice model is very similar to fitting a linear regression model except that we use a different function called M logit multinomial logit models are somewhat specialized so you can't estimate them with LM or even with the GLM function that you may have used before instead we use the M logit function from the M logit package just as with L M there are two key inputs to M legit a formula and the name of the data frame where the data is stored the data input is pretty straightforward but the data has to be choice data that means it has to have a column that indicates which choice observation each alternative belongs to here that is the quest column it also has to have a column of zeros and ones indicating which option was chosen and here that is labeled choice the formula that we use should always begin with the name of the column that indicates the choice because we want to predict the choice then we type a tilde and after the tilde we list the names of the product features we want to use to predict the choice just like L M we also indicate which data frame we want to use to fit the model under the hood the model that we fit with M logit is different than the linear model that we fit with L M for right now we're going to skip over the details of how there different but we'll come back to that in chapter 3 when we asked for a summary of the EM logic model object we get output that looks a lot like what you would get from a regression the most important part of the summary output is the table of coefficients we'll go into more detail on all of the output but for now let's focus on the column labeled estimate the numbers in this column represent the relative value that customers place on each feature for example the coefficient for feature 3 lo is negative 1 point 2 9 which means that people prefer the high level of feature 3 to the low level just like with linear regression the stars all the way to the right hand side indicate which features have a statistically significant effect on choice we'll go into more detail on how to interpret these parameters in chapter 3 but for now just keep in mind that parameters that are more than 1 or less than minus 1 indicate a very strong preference for a feature The Closer the coefficient is to 0 the weaker the preference next let's find out how people value features of sports cars by fitting a choice model to the sports car data\n"