ETH_2020_HFPSR_v01_M
Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone Survey of Refugees 2020
Name | Country code |
---|---|
Ethiopia | ETH |
Socio-Economic/Monitoring Survey [hh/sems]
The high-frequency phone survey of refugees monitors the economic and social impact of and responses to the COVID-19 pandemic on refugees and nationals, by calling a sample of households every four weeks. The main objective is to inform timely and adequate policy and program responses. Since the outbreak of the COVID-19 pandemic in Ethiopia, two rounds of data collection of refugees were completed between September and November 2020. The first round of the joint national and refugee HFPS was implemented between the 24 September and 17 October 2020 and the second round between 20 October and 20 November 2020.
Sample survey data [ssd]
Household
The production date is the last date of the second round of the survey
The scope of the survey includes the following:
Name |
---|
World Bank-UNHCR Joint Data Center on Forced Displacement (JDC) |
The sample was drawn using a simple random sample without replacement. Expecting a high non-response rate based on experience from the HFPS-HH, we drew a stratified sample of 3,300 refugee households for the first round. More details on sampling methodology are provided in the Survey Methodology Document available for download as Related Materials.
To obtain unbiased estimates from the sample, the information reported by households needs to be adjusted by a sampling weight (or raising factor) wh. To construct the sampling weights, we follow the steps outlined in Himelein, K. (2014), though we do not have information for all of the steps:
Begin with base weights. Base weights will equal 1 for all intents and purposes.
2. Derive attrition-adjusted weights for all individuals by running a logistic response propensity model based on characteristics of the household head (i.e. education, labor force status, demographic characteristics), characteristics of the household (consumption, assets, financial characteristics), and characteristics of the dwelling (house ownership, overcrowding). While the proGRES database is limited in the number of socio-economic variables, we have characteristics of the household head and household.
3. Trim weights by replacing the top two percent of observations with the 98th percentile cut-off point; and
4. Post-stratify weights to known population totals to correct for the imbalances across the sample. In doing so, we ensure that the distribution in the survey matches the distribution in the proGRES database.
Additional technical details and explanations on each of the steps briefly outlined above can be found in Himelein, K. (2014).
The Ethiopia COVID-19 High Frequency Phone Survey of Refugee questionnaire consists of the following sections:
A more detailed description of the questionnaire is provided in Table 1 of the Survey Methodology Document that is provided as Related Materials. Round 1 and 2 questionnaires available for download.
Start | End | Cycle |
---|---|---|
2020-09-24 | 2020-10-17 | 1 |
2020-10-20 | 2020-11-20 | 2 |
Name |
---|
Laterite BV |
Senior Field Supervisors served as the first step in ensuring data quality. Senior Field Supervisors reviewed the survey with enumerators twice daily via one-on-one calls and were always available to address any concerns that arose while performing an interview. At the same time, a Research Analyst was in charge of checking the uploaded data daily to correct errors and work to prevent them in future surveys.
The Ethiopia- COVID-19 High Frequency Refugee Phone Survey of Households (RHFPS) was conducted using Computer Assisted Telephone Interview (CATI) techniques. The household questionnaire was implemented using the CATI software, SurveyCTO. Each enumerator was given a tablet which they used to implement the interviews, along with data bundles to be used on their own mobile phone devices. DATA COMMUNICATION SYSTEM: SurveyCTO's built-in data monitoring functions are used. Each enumerator was provided with a data bundle, allowing for internet connectivity and daily synchronization of their tablet. Data was sent to the server daily.
During the roughly three-week period of data collection, each sampled household is called up to three times a day at different hours, with a minimum of three hours between each call, for a minimum of three consecutive days, and a total of nine attempts. The initial calls were made Monday to Saturday, 8:30am to 6:00pm. The respondent is one member of the household, typically the household head. Only in cases where the household head cannot be reached despite numerous call-backs, another knowledgeable household member is selected as the respondent.
When possible, a household interview is rescheduled outside of working hours and days if preferred by the respondent. Also, if a non-household member (usually a reference number for the intended household) was reached or a household member who is unable to answer questions received the call, the information of the household head or a knowledgeable household member is requested for completing the survey or a followed up plan with them is sought if unavailable. Once all these interview penetration strategies were exhausted with no positive response, the household is considered as non-response and was replaced with a different household from the replacement sample.
In each survey domain, additional refugee households are sampled to serve as replacement households in case of non-response.
DATA CLEANING At the end of data collection, the raw dataset was cleaned by the Research team. This included formatting, and correcting results based on monitoring issues, enumerator feedback and survey changes. Data cleaning carried out is detailed below.
Variable naming and labeling:
• Variable names were changed to reflect the lowercase question name in the paper survey copy, and a word or two related to the question.
• Variables were labeled with longer descriptions of their contents and the full question text was stored in Notes for each variable.
• “Other, specify” variables were named similarly to their related question, with “_other” appended to the name.
• Value labels were assigned where relevant, with options shown in English for all variables, unless preloaded from the roster in Amharic.
Variable formatting:
• Variables were formatted as their object type (string, integer, decimal, time, date, or datetime).
• Multi-select variables were saved both in space-separated single-variables and as multiple binary variables showing the yes/no value of each possible response.
• Time and date variables were stored as POSIX timestamp values and formatted to show Gregorian dates.
• Location information was left in separate ID and Name variables, following the format of the incoming roster. IDs were formatted to include only the variable level digits, and not the higher-level prefixes (2-3 digits only.)
• Only consented surveys were kept in the dataset, and all personal information and internal survey variables were dropped from the clean dataset. • Roster data is separated from the main data set and kept in long-form but can be merged on the key variable (key can also be used to merge with the raw data).
• The variables were arranged in the same order as the paper instrument, with observations arranged according to their submission time.
Backcheck data review:
Results of the backcheck survey are compared against the originally captured survey results using the bcstats command in Stata. This function delivers a comparison of variables and identifies any discrepancies. Any discrepancies identified are then examined individually to determine if they are within reason.
The following data quality checks were completed:
• Daily SurveyCTO monitoring: This included outlier checks, skipped questions, a review of “Other, specify”, other text responses, and enumerator comments. Enumerator comments were used to suggest new response options or to highlight situations where existing options should be used instead. Monitoring also included a review of variable relationship logic checks and checks of the logic of answers. Finally, outliers in phone variables such as survey duration or the percentage of time audio was at a conversational level were monitored. A survey duration of close to 15 minutes and a conversation-level audio percentage of around 40% was considered normal.
• Dashboard review: This included monitoring individual enumerator performance, such as the number of calls logged, duration of calls, percentage of calls responded to and percentage of non-consents. Non-consent reason rates and attempts per household were monitored as well. Duration analysis using R was used to monitor each module's duration and estimate the time required for subsequent rounds. The dashboard was also used to track overall survey completion and preview the results of key questions.
• Daily Data Team reporting: The Field Supervisors and the Data Manager reported daily feedback on call progress, enumerator feedback on the survey, and any suggestions to improve the instrument, such as adding options to multiple choice questions or adjusting translations.
• Audio audits: Audio recordings were captured during the consent portion of the interview for all completed interviews, for the enumerators' side of the conversation only. The recordings were reviewed for any surveys flagged by enumerators as having data quality concerns and for an additional random sample of 2% of respondents. A range of lengths were selected to observe edge cases. Most consent readings took around one minute, with some longer recordings due to questions on the survey or holding for the respondent. All reviewed audio recordings were completed satisfactorily.
• Back-check survey: Field Supervisors made back-check calls to a random sample of 5% of the households that completed a survey in Round 1.
Field Supervisors called these households and administered a short survey, including
(i) identifying the same respondent;
(ii) determining the respondent's position within the household;
(iii) confirming that a member of the the data collection team had completed the interview; and
(iv) a few questions from the original survey.
Is signing of a confidentiality declaration required? |
---|
yes |
Use of the dataset must be acknowledged using a citation which would include: - the Identification of the Primary Investigator - the title of the survey (including country, acronym, and year of implementation) - the survey reference number - the source and date of download
Example:
World Bank-UNHCR Joint Data Center on Forced Displacement (JDC). Monitoring COVID-19 Impact on Refugees in Ethiopia: High-Frequency Phone Survey of Refugees 2020. Dataset downloaded from www.microdata.worldbank.org on [date].
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Name | Affiliation | |
---|---|---|
Mugera, Harriet | The World Bank | hmugera@worldbank.org |
Wieser, Christina | The World Bank | cwieser@worldbank.org |
Tsegay, Asmelash | The World Bank | atsegay@worldbank.org |
DDI_ETH_2020_HFPSR_v01_M_WB
Name | Affiliation | Role |
---|---|---|
Development Data Group | World Bank | Documentation of the Study |
2022-06-27