U.S. flag

An official website of the United States government

Skip Header


Data Quality for Detailed Race and Ethnicity Groups and American Indian and Alaska Native Tribes and Villages in the 2020 Census

Written by:

Estimated reading time: 14 minutes

Today the Census Bureau released new statistics about our nation’s communities, providing population counts and sex-by-age statistics for approximately 1,500 detailed race and ethnicity groups and American Indian and Alaska Native tribes and villages in the 2020 Census Detailed Demographic and Housing Characteristics File A (Detailed DHC-A).

In a blog last week, we explained how the level of detail available for each group depends on its population size and the geographic level. This approach also sets targets for the level of uncertainty related to disclosure avoidance embedded in the statistics and illustrates how we are maximizing the utility and relevance of the data we produce while meeting our obligation to protect responses. In fact, the Detailed DHC-A provides a substantial increase in the number of population counts being published relative to the 2010 Census, where groups had to have a population of at least 100 nationally to be included in tabulations.

In this blog, we explain how well the published statistics fit within those targets, giving us confidence in the accuracy and usefulness of the data. We’ll also explain some situations where we suppressed data for groups, and we’ll provide guidance for calculating numbers (like percentages) that are not available in the published tables and for making comparisons.

Accuracy and the Detailed DHC-A

The Census Bureau has long used various statistical safeguards to help us avoid disclosing individuals’ information when we release statistics. As we’ve described in other materials, including the handbook Disclosure Avoidance for the 2020 Census: An Introduction, we are using a stronger system to protect your 2020 Census information against today’s cyberthreats.

The Detailed DHC-A used an algorithm called SafeTab-P to protect respondent confidentiality. This algorithm infuses noise – small, random additions or subtractions – in the total population counts and the sex-by-age statistics for detailed race and ethnicity groups and American Indian and Alaska Native tribes and villages.

In the previous blog, we explained how the level of detail available for a group in the Detailed DHC-A is based on population size (using the noise-infused counts) and geographic level. The blog also explained that we set accuracy targets – in the form of margins of error (MOE) – for how much disclosure avoidance-related uncertainty would be in the published statistics. Table 1 recaps the thresholds and MOEs for the detailed and regional groups.

Table 1. Detailed DHC-A Population Thresholds and Margins of Error (MOE) for Each Level of Geography

Level of Detail Detailed groups Regional groups
Nation and State
(MOE = ±3)
Substate and AIANNH
(MOE = ±11)
Nation and State
(MOE = ±50)
Substate and AIANNH
(MOE = ±50)
Total count only 0–499 22–999 0–4,999 94–4,999
Sex-by-age table – 4 age categories 500–999 1,000–4,999 5,000–19,999 5,000–19,999
Sex-by-age table – 9 age categories 1,000–6,999 5,000–19,999 20,000–149,999 20,000–149,999
Sex-by-age table – 23 age categories 7,000+ 20,000+ 150,000+ 150,000+

Notes:

  • AIANNH refers to American Indian, Alaska Native and Native Hawaiian areas.
  • The age categories are defined in the Table Structure section of the technical documentation and in this graphic.
  • The MOE measures disclosure avoidance-related uncertainty at the 95% confidence interval.
  • Groups are assigned the sex-by-age table that corresponds to their total population count 99.9% of the time.

Source: U.S. Census Bureau. 

The MOE is how close we can expect the noise-infused counts to be to the enumerated counts approximately 95% of the time. This means that for a noise-infused count of 20 with an MOE of three, we can be 95% confident that the enumerated or “true” count is somewhere between 17 and 23.

Our internal metrics that compare final noise-infused counts to the original, enumerated counts show that these expected MOEs are consistently met. Fewer than 5% of counts are outside of the target MOE.

What does this look like for the typical detailed group?

Table 2 shows how close, on average, the noise-infused detailed counts are to the enumerated counts. For example, the cell in the top-right corner shows that the average noise-infused national sex-by-age count is within about one person of the enumerated count. For counties, the average noise-infused sex-by-age count is within about four people of the enumerated count – so, for example, the noise-infused count of 45- to 64-year-old women who are Hungarian alone in Cook County, Illinois, is 667, and the enumerated count is expected to be within about four or five people of that. 

Table 2. How Close, On Average, Are the Detailed Counts?

Geography Total Population Counts Sex-by-Age Counts
Nation ±3.4 ±1.2
State ±1.7 ±1.2
County ±6.5 ±4.4
AIANNH ±5.4 ±4.4
Tract ±4.9 ±4.4
Place ±5.3 ±4.4

Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A. 

The sex-by-age counts have similar expected levels of noise because, for a given level of geography, each count has the same MOE – no matter whether the population receives four, nine or 23 age categories.

  • National and state counts have MOEs of ±3 and, as shown in the top two rows of the sex-by-age counts column, the noise-infused counts are within about one person of their enumerated counts.
  •  Sub-state and American Indian, Alaska Native and Native Hawaiian (AIANNH) area counts have MOEs of ±11, and their sex-by-age counts are within about four or five people on average.

The total population counts are a mix of counts that receive only a total count and those that are aggregations of sex by four, nine or 23 age categories. (When groups receive sex-by-age data, we add up the sex-by-age counts to produce the total population count for the group.) These aggregations have slightly higher MOEs than the total-only counts due to the way that noise adds together. (For details, refer to the Detailed DHC-A technical documentation.)

Even with this additional noise due to aggregation, the average difference between the noise-infused counts and the enumerated counts remains reasonable:

  • On average, national total population counts are within three or four people of the enumerated count.
  • State total population counts are within one or two people.
  • Sub-state and AIANNH total population counts are within five to seven people.

Table 3 repeats this analysis for the regional groups’ counts. The noise-infused sex-by-age counts are within 19 or 20 people of the enumerated counts, on average, while the noise-infused total population counts are slightly farther off because of the effects of aggregation, but they are well within the expected average based on the amount of noise infused.

Table 3. How Close, On Average, Are the Counts of Regional Groups?

Geography Total Population Counts Sex-by-Age Counts
Nation ±115.4 ±19.9
State ±55.3 ±19.9
County ±34.2 ±19.9
AIANNH N/A N/A
Tract ±20.0 ±19.9
Place ±23.6 ±19.9

Note: Regional groups are not available for AIANNH areas except when postprocessing was applied.

Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A. 

Data Availability and Suppression

As we mentioned earlier, the Detailed DHC-A provides a substantial increase in the number of population counts being published relative to the 2010 Census. However, we have implemented some data suppression, processed in the following order, to ensure demographic reasonableness.

  • We implemented a minimum population threshold for sub-state and AIANNH geographies (Table 1) to prevent the infused noise from disclosure avoidance making it look like populations exist in areas where they do not.
  • We suppressed cases where a noise-infused count fell below zero.
  • We suppressed the alone count in areas where the group’s noise-infused alone population count was greater than its noise-infused alone or in any combination population count for that area.

Suppression is specific to a particular geographic area and does not apply to an entire geographic level; a group may be below the minimum population thresholds in a county (and be suppressed) but may exceed the threshold in other counties and receive totals or sex-by-age counts for those counties.

Table 4 shows the proportion of detailed counts for each geographic level that are suppressed. At first glance, many of these numbers appear quite high. For example, 95.5% of county total population counts are suppressed. This is likely due to a combination of geographic concentration – many groups live in only a few counties nationwide – and small county total population sizes throughout much of the country. Despite these high proportions, nearly all these suppressions occur for groups in areas where there were no enumerated members of that group. Counts are still produced, but in a targeted way, for groups in areas where there is sufficient population.

Table 4: Proportions of the Detailed Counts Suppressed, by Geography: 2020

Geography Total Population Counts Sex-by-Age Counts
Nation 1.2% 0.2%
State 23.4% 0.3%
County 95.5% 0.8%
AIANNH 99.2% 1.0%
Tract 98.5% 3.6%
Place 98.7% 1.3%

Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A. 

Similarly, Table 5 shows the proportions of counts for regional groups that are suppressed. Because the regional groups are typically larger than the detailed groups, there is not as much suppression of the total population counts across the different geographic types.

Regional groups that get sex-by-age counts are larger than regional groups that get only a total population count, which is why fewer sex-by-age counts are suppressed. Suppression can still happen for sex-by-age counts if a group has a large enough total population to qualify for the breakdown and meets one of the following criteria:

  • The noisy count for an individual sex-by-age category fell below zero.
  • The noisy alone count for that specific sex-by-age category was less than its corresponding count for the alone or in any combination population.

Suppression happens more frequently at the tract level, as shown in Table 5, because many tracts have unique sex or age concentrations – that is, they are made up of primarily college students, for instance, or older people or single-sex group quarters residents. The thresholds for these tables mean that a tract may qualify for more granular sex-by-age categories based on its total population but, because of its age composition, many of those sex-by-age counts have few or no people in them and are suppressed.

Table 5: Proportions of the Counts for Regional Groups Suppressed, by Geography: 2020

Geography Total Population Counts Sex-by-Age Counts
Nation 0.0% 0.0%
State 0.9% 0.2%
County 69.6% 1.6%
AIANNH N/A N/A
Tract 84.8% 7.3%
Place 88.4% 2.1%

Source: U.S. Census Bureau, 2020 Census Detailed Demographic and Housing Characteristics File A.

Creating Custom Aggregations

In previously released 2020 Census data products (i.e., the Redistricting Data Summary File, the Demographic and Housing Characteristics File and Demographic Profile), data generally become more accurate as you aggregate them. However, this isn’t the case when aggregating data in the Detailed DHC-A because of the way noise was infused in the data.

Data users should exercise caution when creating custom aggregations, adding or subtracting as little as possible to minimize the accumulation of noise across counts. This is because the expected amount of noise in a sum (or a difference) gets larger with each additional count being added or subtracted.

  • To create counts for a custom geography, remove or add as few geographies as possible. For example:
    • To create a count of the Samoan alone population for the Pacific West states, add together the Samoan alone counts for Alaska, California, Hawaii, Oregon and Washington, rather than subtracting the Samoan alone counts of the 46 other states and state equivalents from the national Samoan alone count.
    • To create a count of the Mexican population for the 10 Arizona counties that are majority urban, remove Apache, Graham, Greenlee, Navajo and Santa Cruz counties from the Arizona state total Mexican population rather than combining the Mexican population for the 10 majority urban counties.
  •  To create custom counts using age data, collapse as few categories as possible. For example:
    • To compare groups that have nine and 23 age categories, collapse the 23 age categories into the nine rather than into four age categories or other custom categories when possible.

As another resource for understanding this guidance, we also discussed these examples during the pre-release webinar on September 13.

Calculating Percentages

Data users who want to calculate a variety of percentages as they work with the data may find that not all denominators necessary to calculate the percentages are included in the Detailed DHC-A. Table 6 shows which data source to use when calculating a percentage.

Table 6. Calculating Percentages in the Detailed DHC-A

Characteristic To calculate the percentage of… Use…as the denominator Source of denominator
Race (White, Black or African American, etc.)  The Asian alone population that is Korean alone at the national level. The total Asian alone population in the United States (19,886,049). 2020 Census Redistricting Data (P.L. 94-171) Summary File.
Hispanic or Latino origin People of Hispanic or Latino origin who are Salvadoran in Washington, DC. The total Hispanic or Latino origin population in Washington, D.C. (77,652). 2020 Census Redistricting Data (P.L. 94-171) Summary File.
Geography (California; Harris County, TX; etc.) The population of Honolulu County, HI, that is Native Hawaiian alone or in any combination. The total population of Honolulu County, HI (1,016,508). 2020 Census Redistricting Data (P.L. 94-171) Summary File.
Regional group (European, Caribbean, etc.) The Sub-Saharan African alone or in any combination population that is Beninese alone. The Sub-Saharan African alone or in any combination population. 2020 Census Detailed DHC-A.

Source: U.S. Census Bureau.

More guidance on how to work with the data in the Detailed DHC-A is available in the technical documentation.

Making Comparisons

We made several enhancements to our 2020 Census Hispanic Origin and Race Code List based on feedback from stakeholders, advisors and tribal leaders. These enhancements allowed us to collect, process and tabulate data more accurately. Take these improvements into account if comparing 2020 Census and 2010 Census detailed race and ethnicity data.

For example:

  • In 2010, “Sikh” was tabulated as “Asian Indian.” In 2020, it was tabulated as a unique detailed Asian group.
  • In the 2010 Census, there was a “Guamanian or Chamorro” checkbox. In 2020, the checkbox was updated to “Chamorro.” Detailed write-in responses of “Guamanian” are not included in the “Chamorro” total and are tabulated as part of the “Guamanian” population.
  • In 2020, we also added unique codes for:
  • Based on consultation with tribal leaders over the past decade, we removed tribal groupings and instead include data for individual American Indian and Alaska Native (AIAN) tribes in the code list and the Detailed DHC-A.

Data users can reference the Detailed Race and Ethnicity Crosswalk: 2010 to 2020 to learn more about how specific groups were tabulated in both censuses and learn more about the improvements we made this decade in the blog, Improvements to the 2020 Census Race and Hispanic Origin Question Designs, Data Processing, and Coding Procedures.

Caution should be used when comparing the 2020 Census detailed race data to American Community Survey (ACS) estimates. The 2020 Census provides the official counts (including Hispanic origin and race) of the population and the ACS provides estimates of additional characteristics, adding rich context for understanding the nation’s population.

We implemented these same improvements in the ACS starting in 2020. As a result, use caution when comparing 2020 Census detailed Asian, NHPI and AIAN race data with 2019 or earlier ACS data.

The detailed groups classified as White, Black or African American, and Some Other Race available in the 2020 Census Detailed DHC-A have historically been tabulated from the ancestry question in the ACS. Use caution when comparing ancestry and Detailed DHC-A data because of the different methodology used for collecting, coding and processing data on the two topics. We provide more details on this in the technical documentation.

However, the 2020 Census detailed Hispanic origin data from the ethnicity question are comparable with the 2010 Census and with the ACS, regardless of data year.

Conclusion

The just-released Detailed DHC-A provides a wealth of information about detailed races, ethnicities and American Indian and Alaska Native tribes and villages across the United States. It features substantial improvements to data collection and an updated disclosure avoidance process that enables publishing accurate statistics on approximately 1,500 disaggregated groups while protecting the confidentiality of respondent information.

Thanks to the input of data users, tribal leaders and other stakeholders leading up to the 2020 Census, the Detailed DHC-A offers a rich source of data about the nation's race and ethnicity groups and tribal nations. We look forward to continued engagement in the years to come as we continuously improve how we measure race and ethnicity.

Page Last Revised - October 24, 2023
Is this page helpful?
Thumbs Up Image Yes Thumbs Down Image No
NO THANKS
255 characters maximum 255 characters maximum reached
Thank you for your feedback.
Comments or suggestions?

Top

Back to Header