An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data

Type Working Paper
Title An unconstrained statistical matching algorithm for combining individual and household level geo-specific census and survey data
Author(s)
Publication (Day/Month/Year) 2016
URL http://ro.uow.edu.au/cgi/viewcontent.cgi?article=1035&context=niasrawp
Abstract
The Population Census is an important source of statistical information in most countries that is capable of
producing reliable estimates of population characteristics for small geographic areas. One limitation of a
census is that there are many population characteristics that cannot be collected due to respondent burden or
cost. This means that statistical agencies have to conduct population based surveys to provide social,
economic and demographic characteristics for a target population which are not captured by a large-scale
census. These surveys are usually capable of producing direct estimates at the national level and high level
regions but often cannot produce reliable estimates for smaller areas. Due to the increasing demand for
comprehensive statistical information not only at the national level but also for sub-national domains, there is
a wide discussion in the literature about the use of statistical techniques that combine survey with census data
to provide more detailed, finer-level estimates.
Where censuses and sample surveys are based on the same reporting units, statistical matching techniques can
be employed to link the records from survey and census data where exact matching of reporting units is
impossible due to confidentiality restrictions. These techniques can then provide the detailed social,
economic and demographic information required for small areas.
An approach is developed in this paper in which a close-to-reality synthetic population of individuals and
households is generated from available census tables using an iterative proportional updating (IPU) method.
Statistical matching using a nearest neighbour method is then used to impute survey data to the individuals
and households in the synthetic population. To evaluate this approach, 2011 Bangladesh census data is used to
generate a district-specific synthetic population of individuals and households. Matching is then performed by
imputing the nearest possible records among the 2011 Bangladesh Demographic and Health Survey to
estimate the wealth index for each household within the synthetic population. The results show that using the
method presented in this paper helps with achieving more representative estimates (comparing with direct
survey estimates,) particularly for areas with small sample sizes where not all population units with different
socio-demographic characteristics are included.

Related studies

»