Abstract |
Despite the increasing availability of current national censuses, these datasets are limited by their lack of small area demographic depth. At the same time, spatial microdata that include detailed demographic information are only available for limited geographies, thus limiting the complex analysis of population subgroups within and between small areas. Techniques such as Iterative Proportional Fitting have been previously suggested as a means to generate new data with the demographic granularity of individual surveys and the spatial granularity of small area tabulations of censuses and surveys. This article explores internal and external validation approaches for synthetic, small area, household- and individual-level microdata using a case study for Bangladesh. Using data from the Bangladesh Census 2011 and the Demographic and Health Survey, we produce estimates of infant mortality rate and other household attributes for small areas using a variation of an iterative proportional fitting method called P-MEDM. We conduct an internal validation to determine: whether the model accurately recreates the spatial variation of the input data, how each of the variables performed overall, and how the estimates compare to the published population totals. We conduct an external validation by comparing the estimates with indicators from the 2009 Multiple Indicator Cluster Survey (MICS) for Bangladesh to benchmark how well the estimates compared to a known dataset which was not used in the original model. The results indicate that the estimation process is viable for regions that are better represented in the microdata sample, but also revealed the possibility of strong overfitting in sparsely sampled sub-populations. |