Estimated reading time: 14 minutes
Today the Census Bureau released new statistics about our nation’s communities, providing population counts and sex-by-age statistics for approximately 1,500 detailed race and ethnicity groups and American Indian and Alaska Native tribes and villages in the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A).
In a blog last week, we explained how the level of detail available for each group depends on its population size and the geographic level. This approach also sets targets for the level of uncertainty related to disclosure avoidance embedded in the statistics and illustrates how we are maximizing the utility and relevance of the data we produce while meeting our obligation to protect responses. In fact, the Detailed DHC-A provides a substantial increase in the number of population counts being published relative to the 2010 Census, where groups had to have a population of at least 100 nationally to be included in tabulations.
In this blog, we explain how well the published statistics fit within those targets, giving us confidence in the accuracy and usefulness of the data. We’ll also explain some situations where we suppressed data for groups, and we’ll provide guidance for calculating numbers (like percentages) that are not available in the published tables and for making comparisons.
The Census Bureau has long used various statistical safeguards to help us avoid disclosing individuals’ information when we release statistics. As we’ve described in other materials, including the handbook Disclosure Avoidance for the 2020 Census: An Introduction, we are using a stronger system to protect your 2020 Census information against today’s cyberthreats.
The Detailed DHC-A used an algorithm called SafeTab-P to protect respondent confidentiality. This algorithm infuses noise – small, random additions or subtractions – in the total population counts and the sex-by-age statistics for detailed race and ethnicity groups and American Indian and Alaska Native tribes and villages.
In the previous blog, we explained how the level of detail available for a group in the Detailed DHC-A is based on population size (using the noise-infused counts) and geographic level. The blog also explained that we set accuracy targets – in the form of margins of error (MOE) – for how much disclosure avoidance-related uncertainty would be in the published statistics. Table 1 recaps the thresholds and MOEs for the detailed and regional groups.
Level of Detail | Detailed groups | Regional groups | ||
Nation and State (MOE = ±3) |
Substate and AIANNH (MOE = ±11) |
Nation and State (MOE = ±50) |
Substate and AIANNH (MOE = ±50) |
|
Total count only | 0–499 | 22–999 | 0–4,999 | 94–4,999 |
Sex-by-age table – 4 age categories | 500–999 | 1,000–4,999 | 5,000–19,999 | 5,000–19,999 |
Sex-by-age table – 9 age categories | 1,000–6,999 | 5,000–19,999 | 20,000–149,999 | 20,000–149,999 |
Sex-by-age table – 23 age categories | 7,000+ | 20,000+ | 150,000+ | 150,000+ |
Notes:
Source: U.S. Census Bureau.
The MOE is how close we can expect the noise-infused counts to be to the enumerated counts approximately 95% of the time. This means that for a noise-infused count of 20 with an MOE of three, we can be 95% confident that the enumerated or “true” count is somewhere between 17 and 23.
Our internal metrics that compare final noise-infused counts to the original, enumerated counts show that these expected MOEs are consistently met. Fewer than 5% of counts are outside of the target MOE.
What does this look like for the typical detailed group?
Table 2 shows how close, on average, the noise-infused detailed counts are to the enumerated counts. For example, the cell in the top-right corner shows that the average noise-infused national sex-by-age count is within about one person of the enumerated count. For counties, the average noise-infused sex-by-age count is within about four people of the enumerated count – so, for example, the noise-infused count of 45- to 64-year-old women who are Hungarian alone in Cook County, Illinois, is 667, and the enumerated count is expected to be within about four or five people of that.
Geography | Total Population Counts | Sex-by-Age Counts |
Nation | ±3.4 | ±1.2 |
State | ±1.7 | ±1.2 |
County | ±6.5 | ±4.4 |
AIANNH | ±5.4 | ±4.4 |
Tract | ±4.9 | ±4.4 |
Place | ±5.3 | ±4.4 |
Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A.
The sex-by-age counts have similar expected levels of noise because, for a given level of geography, each count has the same MOE – no matter whether the population receives four, nine or 23 age categories.
The total population counts are a mix of counts that receive only a total count and those that are aggregations of sex by four, nine or 23 age categories. (When groups receive sex-by-age data, we add up the sex-by-age counts to produce the total population count for the group.) These aggregations have slightly higher MOEs than the total-only counts due to the way that noise adds together. (For details, refer to the Detailed DHC-A technical documentation.)
Even with this additional noise due to aggregation, the average difference between the noise-infused counts and the enumerated counts remains reasonable:
Table 3 repeats this analysis for the regional groups’ counts. The noise-infused sex-by-age counts are within 19 or 20 people of the enumerated counts, on average, while the noise-infused total population counts are slightly farther off because of the effects of aggregation, but they are well within the expected average based on the amount of noise infused.
Geography | Total Population Counts | Sex-by-Age Counts |
Nation | ±115.4 | ±19.9 |
State | ±55.3 | ±19.9 |
County | ±34.2 | ±19.9 |
AIANNH | N/A | N/A |
Tract | ±20.0 | ±19.9 |
Place | ±23.6 | ±19.9 |
Note: Regional groups are not available for AIANNH areas except when postprocessing was applied.
Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A.
As we mentioned earlier, the Detailed DHC-A provides a substantial increase in the number of population counts being published relative to the 2010 Census. However, we have implemented some data suppression, processed in the following order, to ensure demographic reasonableness.
Suppression is specific to a particular geographic area and does not apply to an entire geographic level; a group may be below the minimum population thresholds in a county (and be suppressed) but may exceed the threshold in other counties and receive totals or sex-by-age counts for those counties.
Table 4 shows the proportion of detailed counts for each geographic level that are suppressed. At first glance, many of these numbers appear quite high. For example, 95.5% of county total population counts are suppressed. This is likely due to a combination of geographic concentration – many groups live in only a few counties nationwide – and small county total population sizes throughout much of the country. Despite these high proportions, nearly all these suppressions occur for groups in areas where there were no enumerated members of that group. Counts are still produced, but in a targeted way, for groups in areas where there is sufficient population.
Geography | Total Population Counts | Sex-by-Age Counts |
Nation | 1.2% | 0.2% |
State | 23.4% | 0.3% |
County | 95.5% | 0.8% |
AIANNH | 99.2% | 1.0% |
Tract | 98.5% | 3.6% |
Place | 98.7% | 1.3% |
Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A.
Similarly, Table 5 shows the proportions of counts for regional groups that are suppressed. Because the regional groups are typically larger than the detailed groups, there is not as much suppression of the total population counts across the different geographic types.
Regional groups that get sex-by-age counts are larger than regional groups that get only a total population count, which is why fewer sex-by-age counts are suppressed. Suppression can still happen for sex-by-age counts if a group has a large enough total population to qualify for the breakdown and meets one of the following criteria:
Suppression happens more frequently at the tract level, as shown in Table 5, because many tracts have unique sex or age concentrations – that is, they are made up of primarily college students, for instance, or older people or single-sex group quarters residents. The thresholds for these tables mean that a tract may qualify for more granular sex-by-age categories based on its total population but, because of its age composition, many of those sex-by-age counts have few or no people in them and are suppressed.
Geography | Total Population Counts | Sex-by-Age Counts |
Nation | 0.0% | 0.0% |
State | 0.9% | 0.2% |
County | 69.6% | 1.6% |
AIANNH | N/A | N/A |
Tract | 84.8% | 7.3% |
Place | 88.4% | 2.1% |
Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A.
In previously released 2020 Census data products (i.e., the Redistricting Data Summary File, the Demographic and Housing Characteristics File and Demographic Profile), data generally become more accurate as you aggregate them. However, this isn’t the case when aggregating data in the Detailed DHC-A because of the way noise was infused in the data.
Data users should exercise caution when creating custom aggregations, adding or subtracting as little as possible to minimize the accumulation of noise across counts. This is because the expected amount of noise in a sum (or a difference) gets larger with each additional count being added or subtracted.
As another resource for understanding this guidance, we also discussed these examples during the pre-release webinar on September 13.
Data users who want to calculate a variety of percentages as they work with the data may find that not all denominators necessary to calculate the percentages are included in the Detailed DHC-A. Table 6 shows which data source to use when calculating a percentage.
Characteristic | To calculate the percentage of… | Use…as the denominator | Source of denominator |
Race (White, Black or African American, etc.) | The Asian alone population that is Korean alone at the national level. | The total Asian alone population in the United States (19,886,049). | 2020 Census Redistricting Data (P.L. 94-171) Summary File. |
Hispanic or Latino origin | People of Hispanic or Latino origin who are Salvadoran in Washington, DC. | The total Hispanic or Latino origin population in Washington, D.C. (77,652). | 2020 Census Redistricting Data (P.L. 94-171) Summary File. |
Geography (California; Harris County, TX; etc.) | The population of Honolulu County, HI, that is Native Hawaiian alone or in any combination. | The total population of Honolulu County, HI (1,016,508). | 2020 Census Redistricting Data (P.L. 94-171) Summary File. |
Regional group (European, Caribbean, etc.) | The Sub-Saharan African alone or in any combination population that is Beninese alone. | The Sub-Saharan African alone or in any combination population. | 2020 Census Detailed DHC-A. |
Source: U.S. Census Bureau.
More guidance on how to work with the data in the Detailed DHC-A is available in the technical documentation.
We made several enhancements to our 2020 Census Hispanic Origin and Race Code List based on feedback from stakeholders, advisors and tribal leaders. These enhancements allowed us to collect, process and tabulate data more accurately. Take these improvements into account if comparing 2020 Census and 2010 Census detailed race and ethnicity data.
For example:
Data users can reference the Detailed Race and Ethnicity Crosswalk: 2010 to 2020 to learn more about how specific groups were tabulated in both censuses and learn more about the improvements we made this decade in the blog, Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures.
Caution should be used when comparing the 2020 Census detailed race data to American Community Survey (ACS) estimates. The 2020 Census provides the official counts (including Hispanic origin and race) of the population and the ACS provides estimates of additional characteristics, adding rich context for understanding the nation’s population.
We implemented these same improvements in the ACS starting in 2020. As a result, use caution when comparing 2020 Census detailed Asian, NHPI and AIAN race data with 2019 or earlier ACS data.
The detailed groups classified as White, Black or African American, and Some Other Race available in the 2020 Census Detailed DHC-A have historically been tabulated from the ancestry question in the ACS. Use caution when comparing ancestry and Detailed DHC-A data because of the different methodology used for collecting, coding and processing data on the two topics. We provide more details on this in the technical documentation.
However, the 2020 Census detailed Hispanic origin data from the ethnicity question are comparable with the 2010 Census and with the ACS, regardless of data year.
The just-released Detailed DHC-A provides a wealth of information about detailed races, ethnicities and American Indian and Alaska Native tribes and villages across the United States. It features substantial improvements to data collection and an updated disclosure avoidance process that enables publishing accurate statistics on approximately 1,500 disaggregated groups while protecting the confidentiality of respondent information.
Thanks to the input of data users, tribal leaders and other stakeholders leading up to the 2020 Census, the Detailed DHC-A offers a rich source of data about the nation's race and ethnicity groups and tribal nations. We look forward to continued engagement in the years to come as we continuously improve how we measure race and ethnicity.