Using Tidy Census Functions to Access and Analyze American Community Survey (ACS) and Decennial Census Variables
In order to effectively utilize the vast amount of data available from the American Community Survey (ACS) and decennial censuses, users must first understand how to find and use census variable IDs. With thousands of variables at their disposal, it can be challenging for users to determine which specific variables they need. Fortunately, there are various web resources, including intuitive search tools like Census Reporters, that can assist with this task.
Aside from web resources, tidy census also includes built-in functionality to search for variables. The load_variables function in tidy census is particularly useful, as it allows users to download and browse variables datasets from the Census Bureau website. This function has three parameters: year, which refers to the year or end year of the data set; dataset, which refers to the data set in question; and an optional cash parameter that allows users to store the variables dataset on their computer for future browsing.
Once a user acquires a census or ACS variables dataset, they can explore it using tidy verse tools. The datasets returned by load_variables have three columns: census ID code, label, and concept, which refer to the general group to which the variable corresponds. In the example shown in the video, the downloaded data set is filtered for variables within census table B1_001, which covers household income.
Understanding variable ID codes from the ACS can be confusing, so it's essential to break them down. For instance, the variable B1_001_E refers to the number of households with an income in the past twelve months less than ten thousand dollars. The "B" prefix indicates that this variable comes from a base table, which provides the most detailed information available in the ACS. Other available tables and data profiles include collapsed tables denoted by C, data profiles denoted by DP, and subject tables denoted by s.
In some cases, the component 1_9_0_01 refers to the table ID, indicating that the variable belongs to a table of related variables that cover different household income bands. The "00_2" suffix refers to the specific variable ID within that table, while the "E" suffix indicates an estimate and is not required by tidy census functions.
Almost every variable in the ACS is characterized by a margin of error, which tidy census is designed to return by default. However, for data returned in wide format, users may only see these suffixes, such as "M", indicating that it's a margin-of-error variable. By utilizing tidy census functions and understanding how to identify and utilize census variable IDs, users can effectively access and analyze the vast amounts of data available from the ACS and decennial censuses.
Practicing with Tidy Census Functions
To make the most of tidy census functions, users must supply a vector of census variable IDs. This lesson will discuss how to find and use these variable IDs, as well as learn about their formatting. With thousands of variables at their disposal, it can be challenging for users to determine which specific variables they need.
Fortunately, there are web resources available to assist with this task. Aside from web resources, tidy census also includes built-in functionality to search for variables. The load_variables function in tidy census is particularly useful, as it allows users to download and browse variables datasets from the Census Bureau website.
The load_variables function has three parameters: year, which refers to the year or end year of the data set; dataset, which refers to the data set in question; and an optional cash parameter that allows users to store the variables dataset on their computer for future browsing. Once a user acquires a census or ACS variables dataset, they can explore it using tidy verse tools.
Datasets returned by load_variables have three columns: census ID code, label, and concept, which refer to the general group to which the variable corresponds. In the example shown in the video, the downloaded data set is filtered for variables within census table B1_001, which covers household income.
Understanding variable ID codes from the ACS can be confusing, so it's essential to break them down. For instance, the variable B1_001_E refers to the number of households with an income in the past twelve months less than ten thousand dollars. The "B" prefix indicates that this variable comes from a base table, which provides the most detailed information available in the ACS.
Other available tables and data profiles include collapsed tables denoted by C, data profiles denoted by DP, and subject tables denoted by s. In some cases, the component 1_9_0_01 refers to the table ID, indicating that the variable belongs to a table of related variables that cover different household income bands.
The "00_2" suffix refers to the specific variable ID within that table, while the "E" suffix indicates an estimate and is not required by tidy census functions. Almost every variable in the ACS is characterized by a margin of error, which tidy census is designed to return by default. However, for data returned in wide format, users may only see these suffixes, such as "M", indicating that it's a margin-of-error variable.
By utilizing tidy census functions and understanding how to identify and utilize census variable IDs, users can effectively access and analyze the vast amounts of data available from the ACS and decennial censuses.