Drawing Statistical Inferences from International Census Data

Type Report
Title Drawing Statistical Inferences from International Census Data
Author(s)
Publication (Day/Month/Year) 2011
URL http://www.2011.isiproceedings.org/papers/951039.pdf
Abstract
Census microdata are collected by countries around the globe and contain a wealth of information useful to social
science researchers. Although large machine-readable census microdata samples exist for many countries, access to these
data has been limited and the documentation has often been inadequate, making cross-country and across-time comparisons
difficult. The Integrated Public Use Microdata Series-International (IPUMS International) converts census microdata from
multiple countries into a consistent format, supplying comprehensive documentation, and making the data available through
a web-based data dissemination system. Although census microdata used by social scientists, like the data in the IPUMS,
derive from complex samples, researchers commonly apply methods designed for simple random samples. Using full count
data from 4 countries, we evaluate the impact of sample design on standard error estimates of microdata samples from the
IPUMS International. We compare standard error estimates from the full count data to estimates from the 10% public use
samples using three methods: subsample replicate, Taylor series linearization, and estimates using simple random sample
assumptions. We conclude by discussing strategies for obtaining unbiased and efficient estimates of statistical significance.
Like most census microdata, IPUMS samples contain individual level data, clustered by household, and often
stratified and differentially weighted. Standard error estimates from clustered, stratified, and differentially weighted data
can differ dramatically from those derived from simple random samples of the same size. To the extent that the
characteristics of individuals are homogeneous within households, household clustering yields standard errors that are
greater than would be obtained from a simple random sample of the same size. (Graubard and Korn 1996; Mansen,
Hurwitz, and Madow 1953; Kish 1992; Korn and Graubard 1995, 1999). Stratification in census microdata samples has the
opposite effect from clustering and differential weighting: in general, failure to control for the effects of stratification leads
to overestimated standard errors. To the extent that the characteristics of individuals or households are homogeneous within
strata, the variance within the stratum is decreased. Most IPUMS-International samples are systematic random samples,
drawn by selecting every tenth household in the source file after designating a random starting point. The data are typically
sorted according to small geographic areas so that records in resulting samples retain geographic proximity. Therefore, the
systematic sample design is equivalent to low-level geographic stratification, even though no explicit stratification may
have been carried out.

Related studies

»
»
»
»