A decennial count of all people residing in the United States is required by Article I, Section II of the Constitution. The Decennial Census has been gathered every 10 years since 1790 and, unlike more detailed surveys like the American Community Survey (ACS), only collects information on sex, age, and race, as well as the number of occupied and vacant housing units and the number of people living in “group quarters” (nursing homes, prisons, military barracks and college dorms).
The resulting data is used for several purposes:
Apportioning seats in the House of Representatives for each state (see this pdf for the results of reapportionment based on the 2020 Census).
Providing information to state officials for the redrawing of congressional and state legislative districts.
Informing how hundreds of billions of dollars in federal funding are allocated.
These processes are vital for ensuring proper democratic representation and adequate funding for public services for every community in the United States. These uses of Census data are why obtaining an accurate count is so important and why some states spend millions of dollars on outreach and mobilization.
For an interesting discussion of the potential political ramifications of these state initiatives, check out this episode of the FiveThirtyEight Politics Podcast. For a discussion of the ramifications of an undercount, check out this episode of the Count on Your Census podcast. And for a more in-depth overview of the Census, including how it functions, what purposes it serves, and how it protects participants’ privacy, check out this episode of the Civics 101 Podcast.
Data Quality and Challenges for the 2020 Census
The Decennial Census is a massive undertaking, with planning starting even before the previous Census is complete. Despite all of this preparation, it is incredibly difficult to get an exact count, with the National self-response rate for the 2020 Census at just 67% of addresses, meaning that the rest of the addresses must be resolved and enumerated as part of “non-response follow-up.” This step involves either sending a census taker to the address to speak with a member of the household or a proxy (e.g. a neighbor, landlord, or building manager) or utilizing administrative records (e.g. birth certificates, death records, etc.). Proxies were used to resolve 24.1% of non-responding housing units, and administrative records were used for 5.6% of addresses nationwide.
Although the self-response rate might seem low and these alternative measures used quite frequently, these rates are similar to what was seen in the 2010 Census and show that, despite challenges like an ongoing pandemic, several notable natural disasters, and a shortened timeline to complete the count, the 2020 Census was able to maintain similar collection metrics to the previous Census. (For more information, check out the Census Bureau’s page on Data Quality, and for a deep dive into the effects we might see as data quality measurements are updated, check out this panel from ICPSR).
Although these quality metrics paint the picture of a relatively normal Census, it is important to acknowledge that communities of color have been historically undercounted in the Decennial Census. The Constitution and its requirement for a decennial Census lay bare the white supremacist ideologies of our early democracy through the exclusion of Native populations and the diminishment of Black slaves to ⅗ of a person. Prior to 1960, respondents could not select their own race; it was selected for them by a census taker based largely on appearance.
The racial categories employed by the Census have frequently shifted to reflect the reigning racial hierarchy of American society, often to the exclusion or erasure of large swaths of the population (for more information on these shifting categories, see this article from Pew Research). Furthermore, communities that have been historically subject to government surveillance may be hesitant to participate in the Census for fear of government backlash. This issue came to national attention with the former President Trump’s desire to add a citizenship question to the 2020 Census. Although this question was not ultimately not included, the fear it stoked in both documented and undocumented immigrant populations remains real and is not without historical precedent (see this article from NPR on how Census data was used as a tool of Japanese internment during World War II). For more on how Census anxiety affects historically marginalized communities, check out this episode of the Code Switch podcast.
Confidentiality vs. Accuracy: Differential Privacy and the 2020 Census
The Census Bureau is required by law to take steps to protect the privacy and confidentiality of Census respondents. Throughout its history the Census has used a variety of methods to ensure that any published statistics cannot be used to trace back to an individual respondent (see this page for a neat diagram of safeguards in previous Censuses). In some cases this requires the suppression of certain statistics, particularly for geographies with smaller populations.
For the last three Censuses, the primary technique used to accomplish this goal was “swapping,” in which responses to certain questions are traded between respondents that already have similar responses to other questions. This is usually done within a specific geography to ensure that population-level statistics are not affected.
However, this year a new method has been applied to all data below state-level totals, based on the theory of Differential Privacy, which requires “noise,” or random errors, to be added to many published statistics. Although this method does achieve a greater level of confidentiality, many are concerned about the substantial sacrifice in data accuracy needed to make data differentially private. See this webinar from the National Congress of American Indians on how this new method may adversely affect Native communities and other low-population areas.
Differential privacy is intended as a safeguard against database reconstruction – recreating an entire database by using published top-line statistics as a set of solvable equations. However, this paper from Ruggles and Van Riper suggests that the Census Bureau’s own attempts to reconstruct the 2010 Census database achieve similar accuracy to randomly guessing the characteristics of individual respondents. The use of this method remains a point of contention in the social science research community. (Check out more papers and background information on IPUMS’s website.)
What Changed? Disasters, A Shortened Count, and Statistical Noise
Despite some of the shortcomings in the 2020 Census, both with regards to the completeness of the count and the accuracy of published data, the full data set that will be published on September 30th still serves as one of the best datasets for studying demographics and how they have changed over time in each community. At the same time, it is important to remember that data may be erroneous or suppressed in low population areas and you may need to look for alternative methods for studying these trends. If you want to download the state summary redistricting data (in “legacy” format, meaning text files) head to the Redistricting Data page of the Census Bureau website. If you want to learn more about the Census and check out recent webinars on some of the topics discussed above, check out the Census Academy website.
About the Author
Aedan Lombardo is Bluebonnet's Data Director. Aedan got his start in political data as a Data Fellow in 2020 working on Aric Putnam's campaign for Minnesota State Senate District 14. Prior to taking the lead on Bluebonnet's internal data operations, Aedan ran Bluebonnet's Tech Fellowship as Program Director.