IHSN Survey Catalog
  • Home
  • Microdata Catalog
  • Citations
  • Login
    Login
    Home / Central Data Catalog / MWI_2010_IHS-III_V01_M_V01_A_ML
central

Integrated Household Living Conditions Survey 2010-2011 ; Subset for Machine Learning Comparative Assessment Project

Malawi, 2010 - 2011
Get Microdata
Reference ID
MWI_2010_IHS-III_v01_M_v01_A_ML
Producer(s)
National Statistical Office (NSO)
Metadata
DDI/XML JSON
Study website
Created on
Sep 19, 2018
Last modified
Sep 19, 2018
Page views
10897
Downloads
562
  • Study Description
  • Data Dictionary
  • Downloads
  • Get Microdata
  • Identification
  • Scope
  • Coverage
  • Producers and sponsors
  • Sampling
  • Survey instrument
  • Data collection
  • Data Access
  • Disclaimer and copyrights
  • Identification

    Survey ID number

    MWI_2010_IHS-III_v01_M_v01_A_ML

    Title

    Integrated Household Living Conditions Survey 2010-2011 ; Subset for Machine Learning Comparative Assessment Project

    Country
    Name Country code
    Malawi MWI
    Study type

    Living Standards Measurement Study [hh/lsms]

    Series Information

    The First Integrated Household Survey (IHS1) was designed by the NSO with technical assistance from the International Food Policy Research Institute (IFPRI) and the World Bank (WB) to provide a complete and integrated data set to better understand target groups of households affected by poverty. The IHS1 was conducted in Malawi from November 1997 through October 1998 and provided for a broad set of applications on policy issues regarding households' behavior and welfare, distribution of income, employment, health and education. In 2003, the Government of Malawi decided to conduct the Second Integrated Household Survey (IHS2) in order to compare the current situation with the situation in 1997-98, and to collect more detailed information in specific areas. The IHS2 was implemented from March 2004 through March 2005. And, Third Integrated Household Survey (IHS3) was conducted by National Statistical Office (NSO) in March 2010-March 2011.

    Abstract

    This dataset contains a set of data files used as input for a World Bank research project (empirical comparative assessment of machine learning algorithms applied to poverty prediction). The objective of the project was to compare the performance of a series of classification algorithms. The dataset contains variables at the household, individual, and community levels. The variables selected to serve as potential predictors in the machine learning models are all qualitative variables (except for the household size). Information on household consumption is included, but in the form of dummy variables (indicating whether the household consumed or not each specific product or service listed in the survey questionnaire). The household-level data file contains the variables "Poor / Non poor" which served as the predicted variable ("label") in the models.

    One of the data files included in the dataset contains data on household consumption (amounts) by main categories of products and services. This data file was not used in the prediction model. It is used only for the purpose of analyzing the models mis-classifications (in particular, to identify how far the mis-classified households are from the national poverty line).

    These datasets are provided to allow interested users to replicate the analysis done for the project using Python 3 (a collection of Jupyter Notebooks containing the documented scripts is openly available on GitHub).

    Kind of Data

    Sample survey data [ssd]

    Unit of Analysis
    • Households
    • Individuals
    • Communities

    Scope

    Notes

    HOUSEHOLD: household conditions and amenities, list of consumed items, ownership of household assets, farm equipment, non-agricultural business information, income sources, etc.
    INDIVIDUAL: basic demographic information, education attainment, health, and employment information.

    Coverage

    Geographic Coverage

    National

    Producers and sponsors

    Primary investigators
    Name Affiliation
    National Statistical Office (NSO) Ministry of Economic Planning and Development (MoEPD)
    Producers
    Name Affiliation Role
    Development Economics Data Group The World Bank Group Generated the datasets that were used in the Machine Learning Comparative Assessment Project.

    Sampling

    Sampling Procedure

    The IHS3 sampling frame is based on the listing information and cartography from the 2008 Malawi Population and Housing Census (PHC); includes the three major regions of Malawi, namely North, Center and South; and is stratified into rural and urban strata. The urban strata include the four major urban areas: Lilongwe City, Blantyre City, Mzuzu City, and the Municipality of Zomba. All other areas are considered as rural areas, and each of the 27 districts were considered as a separate sub-stratum as part of the main rural stratum. It was decided to exclude the island district of Likoma from the IHS3 sampling frame, since it only represents about 0.1% of the population of Malawi, and the corresponding cost of enumeration would be relatively high. The sampling frame further excludes the population living in institutions, such as hospitals, prisons and military barracks. Hence, the IHS3 strata are composed of 31 districts in Malawi.

    A stratified two-stage sample design was used for the IHS3.

    Weighting

    In order to analyze the data and produce accurate representativeness of the population, the sample variables must be weighted using the household sampling weights provided in each file as hhwght. As noted above, the IHS3 data are representative at the national, urban/rural, regional and district-level.

    The basic weight for each sample household is equal to the inverse of its probability of selection (calculated by multiplying the probabilities at each sampling stage). As indicated in the previous section, the IHS3 sample EAs were selected within each district with PPS from the 2008 PHC frame. At the second stage, 16 sample households were selected with equal probability from the listing for 33 each sample EA.

    Note: Detailed weighting information is presented in the "Third Integrated Household Survey 2010-2011, Basic Information Document" document.

    Survey instrument

    Questionnaires

    The survey was collectd using four questionnaires:

    1. Household Questionnaire
    2. Agriculture Questionnaire
    3. Fishery Questionnaire
    4. Community Questionnaire

    Data collection

    Dates of Data Collection
    Start End
    2010-03 2011-03
    Data Collectors
    Name Affiliation
    National Statistical Office Ministry of Economic Planning and Development (MoEPD)

    Data Access

    Citation requirements

    Use of the dataset must be acknowledged using a citation which would include:

    • the Identification of the Primary Investigator
    • the title of the survey (including country, acronym and year of implementation)
    • the survey reference number
    • the source and date of download

    Disclaimer and copyrights

    Disclaimer

    The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.

    Back to Catalog
    IHSN Survey Catalog

    © IHSN Survey Catalog, All Rights Reserved.