Issues with the data
The endline caregiver survey (see "Tool 9_Caregiver Endline (C)" sheet in “EV_Data_Endline Visit_April 1 2021.xlsx" did not collect data on six, but just on four siblings (Q1).
Question “Q20_b_If no, what evidence makes you think this?” (column EY) in the “House Visit 3 with caregivers” sheet in “V3_Data_House Visit 3 Raw Data.xlsx” has most of the values in Cambodian.
“End_Time” does not exist in the EGRA baseline and endline test. Chanmony confirmed these were lost on the way when cleaning the data.
The “Start_Time” is with time zone Bangkok (confirmed by Chanmony), but the origin is unknown.
In many cases the columns for the same questions in different instruments were named differently because of typos.
In many cases the response categories for the same questions in different instruments were different due to typos or spelling mistakes. These had to be identified and fixed, so that they are all the same.
All string variables were converted to categorical (factor) variables with the correct order of the categories as in the questionnaires. The only exceptions were the “Other, specify” variables where the respondents have to give a free response.