Type | Report |
Title | Drawing Statistical Inferences from International Census Data |
Author(s) | |
Publication (Day/Month/Year) | 2011 |
URL | http://www.2011.isiproceedings.org/papers/951039.pdf |
Abstract | Census microdata are collected by countries around the globe and contain a wealth of information useful to social science researchers. Although large machine-readable census microdata samples exist for many countries, access to these data has been limited and the documentation has often been inadequate, making cross-country and across-time comparisons difficult. The Integrated Public Use Microdata Series-International (IPUMS International) converts census microdata from multiple countries into a consistent format, supplying comprehensive documentation, and making the data available through a web-based data dissemination system. Although census microdata used by social scientists, like the data in the IPUMS, derive from complex samples, researchers commonly apply methods designed for simple random samples. Using full count data from 4 countries, we evaluate the impact of sample design on standard error estimates of microdata samples from the IPUMS International. We compare standard error estimates from the full count data to estimates from the 10% public use samples using three methods: subsample replicate, Taylor series linearization, and estimates using simple random sample assumptions. We conclude by discussing strategies for obtaining unbiased and efficient estimates of statistical significance. Like most census microdata, IPUMS samples contain individual level data, clustered by household, and often stratified and differentially weighted. Standard error estimates from clustered, stratified, and differentially weighted data can differ dramatically from those derived from simple random samples of the same size. To the extent that the characteristics of individuals are homogeneous within households, household clustering yields standard errors that are greater than would be obtained from a simple random sample of the same size. (Graubard and Korn 1996; Mansen, Hurwitz, and Madow 1953; Kish 1992; Korn and Graubard 1995, 1999). Stratification in census microdata samples has the opposite effect from clustering and differential weighting: in general, failure to control for the effects of stratification leads to overestimated standard errors. To the extent that the characteristics of individuals or households are homogeneous within strata, the variance within the stratum is decreased. Most IPUMS-International samples are systematic random samples, drawn by selecting every tenth household in the source file after designating a random starting point. The data are typically sorted according to small geographic areas so that records in resulting samples retain geographic proximity. Therefore, the systematic sample design is equivalent to low-level geographic stratification, even though no explicit stratification may have been carried out. |