Mastering the Skill of Joining Multiple Tables Together in Data Analysis
In data analysis, it is often the case that information is not confined to one table. This can occur when working with large datasets, such as the Lego dataset used in this course. The data set contains a wealth of information about the sets, parts, themes, and colors that make up Lego history. However, the data is spread across multiple tables, making it necessary to learn the skill of joining multiple tables together.
One of the most important tools for joining tables is the inner join verb in Deep Lie Art. In this chapter, we will focus on mastering the inner join, starting with the sets table in the Lego dataset. The sets table contains one row for each of the 4,977 LEGO sets, including a medium gift set from 1949. Notably, there is a column called theme underscore ID that is not useful on its own, as the useful information about the theme name is contained in a separate table called themes.
The theme ID variable in the sets table links to the ID variable in the themes table for any individual set. This allows us to join the two tables and find out which theme each set is associated with. To do this, we use the inner join verb, specifying that it should match the tables based on the equality of the theme underscore ID in both tables.
The result of the inner join is a combined table where each set is paired with its corresponding theme. However, because both tables have a variable called name, we end up with duplicate columns, resulting in a messy output such as "named X" for the sets name and "name dot Y" for the themes name. This is not ideal, but inner join allows us to customize this by adding another argument, suffix equals c underscore set comma underscore theme. This appends underscore set or underscore theme to the shared columns, resulting in a more readable output such as "name underscore set" and "name underscore theme".
This pattern of taking two tables, finding a link between them, and joining them together is very common in data analysis. By mastering this skill, we can make many interesting discoveries throughout this course. In fact, in the exercises that follow, we will learn about two new tables from the Lego dataset: parts and part underscore categories. We will practice joining these tables together to gain a deeper understanding of the data.
One of the key takeaways from this exercise is how important it is to be aware of duplicate columns when working with inner joins. By using the suffix argument, we can ensure that our output is clean and easy to read. Additionally, this exercise highlights the importance of joining tables together to gain a deeper understanding of the data. By combining sets with their corresponding themes, we can answer interesting questions about the data, such as finding out what the most common themes are in Lego history.
In conclusion, mastering the skill of joining multiple tables together is an essential part of data analysis. Through practice and patience, we can unlock many interesting discoveries and gain a deeper understanding of our data. In this chapter, we have covered the basics of inner join and how to customize it for optimal results. We will continue to explore more advanced topics in future chapters, but for now, let's focus on mastering the skill of joining tables together.
"WEBVTTKind: captionsLanguage: enin other data camp courses you may have learned to use the powerful deep lie art package to explore and transform a data set but the information you'll need for an analysis isn't always confined to one table in this course you'll master the important skill of joining multiple tables together so that they can be analyzed in combination you'll work with a fun data set about the construction toys known as Legos the data comes from the Reba Keable website and has tons of fun information about the sets parts themes and colors that make up lego history the data set is fascinating but it's spread across many tables you'll run into this kind of data a lot if you work in data science in this chapter you'll be focusing on one deep lie our verb inner join and you'll start by working with the sets table in the Lego data this table contains one row for each of the four thousand nine hundred and seventy-seven LEGO sets starting with sets like medium gift set back in nineteen forty nine notice that there is a column that's not useful on its own theme underscore ID that's because the useful information the theme name is in a separate table called themes the theme ID variable in the sets table links to the ID variable in the themes table for any individual set we could find a theme that matches it to see the theme that each set is associated with we'll need to join the two tables to do this you use inner join this joins the first table sets to the second table themes notice that the argument by equals theme underscore ID equals ID that tells inner join how to match the tables linking theme ID in the first table to ID in the second table notice that in the output you've combined the two tables combining each set with its theme but because both tables had a variable called name you end up with named X with the sets name and name dot Y with the themes name because you cannot have two variables with the same name inner join lets you customize this to be more readable add another argument suffix equals c underscore set comma underscore theme this appends underscore set or underscore theme to the shared columns which gets the much more readable name underscore set and name underscore theme now we can answer interesting questions about the data for instance we could find out what the most common themes are in Lego history by piping again to count name underscore theme with sort equals true this pattern of taking two tables finding a link between them and joining them together is very common and will enable you to make a lot of interesting discoveries throughout this course for starters in the exercises you'll learn about two new tables from the Lego data set parts and part underscore categories and then practice joining them together a part is a shape like a gear a 2x4 brick or a figurine and will come up a lot in this course so let's pressin other data camp courses you may have learned to use the powerful deep lie art package to explore and transform a data set but the information you'll need for an analysis isn't always confined to one table in this course you'll master the important skill of joining multiple tables together so that they can be analyzed in combination you'll work with a fun data set about the construction toys known as Legos the data comes from the Reba Keable website and has tons of fun information about the sets parts themes and colors that make up lego history the data set is fascinating but it's spread across many tables you'll run into this kind of data a lot if you work in data science in this chapter you'll be focusing on one deep lie our verb inner join and you'll start by working with the sets table in the Lego data this table contains one row for each of the four thousand nine hundred and seventy-seven LEGO sets starting with sets like medium gift set back in nineteen forty nine notice that there is a column that's not useful on its own theme underscore ID that's because the useful information the theme name is in a separate table called themes the theme ID variable in the sets table links to the ID variable in the themes table for any individual set we could find a theme that matches it to see the theme that each set is associated with we'll need to join the two tables to do this you use inner join this joins the first table sets to the second table themes notice that the argument by equals theme underscore ID equals ID that tells inner join how to match the tables linking theme ID in the first table to ID in the second table notice that in the output you've combined the two tables combining each set with its theme but because both tables had a variable called name you end up with named X with the sets name and name dot Y with the themes name because you cannot have two variables with the same name inner join lets you customize this to be more readable add another argument suffix equals c underscore set comma underscore theme this appends underscore set or underscore theme to the shared columns which gets the much more readable name underscore set and name underscore theme now we can answer interesting questions about the data for instance we could find out what the most common themes are in Lego history by piping again to count name underscore theme with sort equals true this pattern of taking two tables finding a link between them and joining them together is very common and will enable you to make a lot of interesting discoveries throughout this course for starters in the exercises you'll learn about two new tables from the Lego data set parts and part underscore categories and then practice joining them together a part is a shape like a gear a 2x4 brick or a figurine and will come up a lot in this course so let's press\n"