Abstract |
Understanding the possible pitfalls of survey data is critical for empirical research. Among other things, poor data quality can lead to biased regression estimates, potentially resulting in incorrect interpretations that mislead researchers and policymakers alike. Common data problems include difficulties in tracking respondents and high survey attrition, enumerator error and bias, and respondent reporting error. This paper describes and analyzes these issues in Round 1 of the Kenyan Life Panel Survey (KLPS-1), collected in 2003-2005. The KLPS-1 is an innovative longitudinal dataset documenting a wide range of outcomes for Kenyan youths who had originally attended schools participating in a deworming treatment program starting in 1998. The careful design of this survey allows for examination of an array of data quality issues. First, we explore the existence and implications of sample attrition bias. Basic residential, educational, and mortality information was obtained for 88% of target respondents, and personal contact was made with 84%, an exceptionally high follow-up rate for a young adult population in a less developed country. Moreover, rates of sample attrition are nearly identical for respondents who were randomly assigned deworming treatment and for those who were not, a key factor in the validity of subsequent statistical analysis. One vital component of this success is the tracking of respondents both nationally and across international borders (in our case, into Uganda), thus we discuss in detail the costs and benefits of tracking movers. Finally, we study KLPS-1 data quality more broadly by examining enumerator error and bias, as well as survey response consistency. We conclude that the extent of enumerator error is low, with an average of less than one recording error per survey. Errors decrease over time as enumerator experience with the survey instrument increases, but increase over the course of multiple interviews within a single day, presumably due to fatigue. We do find some evidence that the enumerator-respondent match in terms of gender, ethnicity, and religion correlates with responses regarding trust of others and religious activities, suggesting some field officer bias on sensitive questions. Reporting reliability is analyzed using respondent re-surveys. These checks show high levels of consistency across survey/re-survey rounds for the respondent’s own characteristics and personal history,with lower reliability rates on questions asked about others’ characteristics. The steps taken in the design of KLPS-1 to avoid common errors in survey data collection greatly improved the quality of this panel dataset, and provide some valuable lessons for future field data collection projects. |