The Pakistan Integrated Household Survey (PIHS) was conducted jointly by the Federal Bureau of Statistics (FBS), Government of Pakistan, and the World Bank. The survey was part of the Living Standards Measurement Study (LSMS) household surveys that have been conducted in a number of developing countries with the assistance of the World Bank. The purpose of these surveys is to provide policy makers and researchers with individual, household, and community level data needed to analyze the impact of policy initiatives on living standards of households.
The Pakistan Integrated Household Survey was carried out in 1991. This nationwide survey gathered individual and household level data using a multi-purpose household questionnaire. Topics covered included housing conditions, education, health, employment characteristics, selfemployment activities, consumption, migration, fertility, credit and savings, and household energy consumption. Community level and price data were also collected during the course of the survey.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
Unit of Analysis
Producers and sponsors
Authoring entity/Primary investigators
Federal Bureau of Statistics (FBS)
The World Bank
The sample for the PIHS was drawn using a multi-stage stratified sampling procedure from the Master Sample Frame developed by FBS based on the 1981 Population Census.
This sample frame covers all four provinces (Punjab, Sindh, NWFP, and Balochistan) and both urban and rural areas. Excluded, however, are the Federally Administered Tribal Areas, military restricted areas, the districts of Kohistan, Chitral and Malakand and protected areas of NWFP. According to the FBS, the population of the excluded areas amounts to about 4 percent of the total population of Pakistan. Also excluded are households which depend entirely on charity for their living.
The sample frame consists of three main domains: (a) the self-representing cities; (b) other urban areas; and (c) rural areas. These domains are further split up into a number of smaller strata based on the system used by the Government to divide the country into administrative units. The four provinces of Pakistan mentioned above are divided into 20 divisions altogether; each of these divisions in turn is then further split into several districts. The system used to divide the sample frame into the three domains and the various strata is as follows:
(a) Self-representing cities: All cities with a population of 500,000 or more are classified as self-representing cities. These include Karachi, Lahore, Gujranwala, Faisalabad, Rawalpindi, Multan, Hyderabad and Peshawar. In addition to these cities, Islamabad and Quetta are also included in this group as a result of being the national and provincial capitals respectively. Each self-representing city is considered as a separate stratum, and is further sub-stratified into low, medium, and high income groups on the basis of information collected at the time of demarcation or updating of the urban area sample frame.
(b) Other urban areas: All settlements with a population of 5,000 or more at the time of the 1981 Population Census are included in this group (excluding the self-representing cities mentioned above). Urban areas in each division of the four provinces are considered to be separate strata.
(c) Rural areas: Villages and communities with population less than 5,000 (at the time of the Census) are classified as rural areas. Settlements within each district of the country are considered to be separate strata with the exception of Balochistan province where, as a result of the relatively sparse population of the districts, each division instead is taken to be a stratum.
As the above table shows, the sample frame consists of 88 strata altogether. Households in each stratum of the sample frame are exclusively and exhaustively divided into PSUs. In urban areas, each city or town is divided into a number of enumeration blocks with welldefined boundaries and maps. Each enumeration block consists of about 200-250 households, and is taken to be a separate PSU. The list of enumeration blocks is updated every five years or so, with the list used for the PIHS having been modified on the basis of the Census of Establishments conducted in 1988.
In rural areas, demarcation of PSUs has been done on the basis of the list of villages/mouzas/dehs published by the Population Census Organization based on the 1981 Census.
Each of these villages/mouzas/dehs is taken to be a separate PSU.
Altogether, the sample frame consists of approximately 18,000 urban and 43,000 rural PSUs.
The PIHS sample comprised 4,800 households drawn from 300 PSUs throughout the country. Sample PSUs were divided equally between urban and rural areas, with at least two PSUs selected from each of the strata. Selection of PSUs from within each stratum was carried out using the probability proportional to estimated size method. In urban areas, estimates of the size of PSUs were based on the household count as found during the 1988 Census of Establishments. In rural areas, these estimates were based on the population count during the 1981 Census.
Once sample PSUs had been identified, a listing of all households residing in the PSU was made in all those PSUs where such a listing exercise had not been undertaken recently. Using systematic sampling with a random start, a short-list of 24 households was prepared for each PSU. Sixteen households from this list were selected to be interviewed from the PSU; every third household on the list was designated as a replacement household to be interviewed only if it was not possible to interview either of the two households immediately preceding it on the list.
As a result of replacing households that could not be interviewed because of non-responses, temporary absence, and other such reasons, the actual number of households interviewed during the survey - 4,794 - was very close to the planned sample size of 4,800 households. Moreover, following a pre-determined procedure for replacing households had the added advantage of minimizing any biases that may otherwise have arisen had field teams been allowed more discretion in choosing substitute households.
SAMPLE DESIGN EFFECTS:
The three-stage stratified sampling procedure outlined above has several advantages from the point of view of survey organization and implementation. Using this procedure ensures that all regions or strata deemed important are represented in the sample drawn for the survey. Picking clusters of households or PSUs in the various strata rather than directly drawing households randomly from throughout the country greatly reduces travel time and cost. Finally, selecting a fixed number of households in each PSU makes it easier to distribute the workload evenly amongst field teams. However, in using this procedure to select the sample for the survey, two important matters need to be given consideration: (a) sampling weights or raising factors have to be first calculated to get national estimates from the survey data; and (b) the standard errors for estimates obtained from the data need to be adjusted to take account for the use of this procedure.
If a simple random sampling procedure had been used to draw the sample for the survey, the data collected could have been used directly to obtain national as well as regional estimates without the need for sampling weights or raising factors. However, in using data from a sample drawn by the procedure outline above, allowance needs to be made for the fact that this sampling procedure does not give all households in the country an equal chance of being selected for the survey. If no sampling weights are used with the data, the resulting estimates are likely to be biased as different types of households may not be represented in the sample in the same proportion as they exist in the population as a whole. In simple terms, sample weights attempt to correct for the fact that different households in the country have different chances of being included in the sample for the survey. To allow adjustment to be made for over-sampling of certain strata in the PIHS sample, sampling weights have been calculated, and have been incorporated into the PIHS data sets that are distributed. These raising factors should be used to weight data in order to obtain nationally representative statistics. In what follows. The way these sampling weights have been calculated is briefly outlined below.
The first aspect of the sampling strategy adopted for the PIHS that needs to be taken into consideration when calculating sampling weights is the stratification of the sample frame. Instead of picking PSUs at random from the country as a whole, PSUs for the PIHS survey were selected so as to ensure that at least 2 were picked from each strata of the Master Sample frame. Half the sample was picked from strata in urban areas even though they constituted less than 32 percent of the country s estimated population in 1991. In order to correct for such over-sampling, the weight for households drawn from each strata needs to include a component that is inversely proportional to the probability of selection of PSUs in that strata. In other words, the greater the assigned probability for selecting PSUs in a particular stratum, the lower the weight we should give to households picked from this stratum.
The second step of sample selection for the PIHS - i.e. the selection of PSUs within each stratum- was carried out using the probability proportional to estimated size (PPS) procedure. In this method, a large PSU is assigned a higher probability of selection than a smaller PSU by a factor that is directly proportional to their relative size. If an equal number of households are to be interviewed in each selected PSU, then this method in principle results in a self-weighted sample within each stratum. In other words, all households within the stratum have an equal chance of selection in the sample and should therefore be allotted the same weight. In practice, however, allowance almost always needs to be made for the fact that the actual size of the PSU as found during the household listing exercise differs from the estimated size on which the selection of the PSU from the sample frame was based. The weight assigned to households in different PSUs thus includes a second component that is directly proportional to the ratio of the PSU s actual size to its estimated size. Households in a PSU where the count during the listing exercise reveals the population to be 50 percent higher than that earlier supposed are thus given a weight 50 percent higher than that assigned to households in a PSU where these two counts are found to coincide.
Finally, the third step of sample selection - i.e. that of selecting households within each PSU - does not have any effect on sampling weights; therefore, all households within a particular PSU are assigned the same weight. This is because the “systematic sampling with a random start” procedure used to select households gives all households in the PSU an equal chance of selection. Even the use of replacement households in the case of the PIHS does not affect the assignment of weights within the PSU, as the process of selection of replacement households was the same as that used to select the other 16 households to be interviewed from the PSU.
The formula used to calculate the weight assigned to the various PSUs is as follows:
Wij = k x (1/Pij) x (Nj/Sj)
where Wij is the weight assigned to households in PSU j of stratum i, k is some constant, Pij is the assigned probability of selection of PSU j of stratum i, (i.e. the higher the given probability of selection, the lower the weight given to the PSU), Nj is the number of households in the PSU j as found during the listing exercise, and Sj is the number of households in the PSU j on which the PPS was based.
Dates of Data Collection (YYYY/MM/DD)
Mode of data collection
Type of Research Instrument
The PIHS used three questionnaires: a household questionnaire, a community questionnaire, and a price questionnaire.
The PIHS questionnaire comprised 17 sections, each of which covered a separate aspect of household activity. The various sections of the household questionnaire were as follows:
1. HOUSEHOLD INFORMATION
5. WAGE EMPLOYMENT
6. FAMILY LABOR
9. FARMING AND LIVESTOCK
10. NON-FARM ENTERPRISE ACTIVITIES
11. NON-FOOD EXPENDITURES AND INVENTORY OF DURABLE GOODS
12. FOOD EXPENSES AND HOME PRODUCTION
13. MARRIAGE AND MATERNITY HISTORY
15. CREDIT AND SAVINGS
16. TRANSFERS AND REMITTANCES
17. OTHER INCOME
The household questionnaire was designed to be administered in two visits to each sample household. Apart from avoiding the problem of interviewing household members in one long stretch, scheduling two visits also allowed the teams to improve the quality of the data collected.
During the first visit to the household (Round 1), the enumerators covered sections 1 to 8, and fixed a date with the designated respondents of the household for the second visit. During the second visit (Round 2), which was normally held two weeks after the first visit, the enumerators covered the remaining portion of the questionnaire and resolved any omissions or inconsistencies that were detected during data entry of information from the first part of the survey.
Since many of the sections of the questionnaire pertained specifically to female members of the household, female interviewers were included in conducting the survey. The household questionnaire was split into two parts (Male and Female). Sections such as SECTION 3: EDUCATION, which solicited information on all individual members of the household (male as well as female) were included in both parts of the questionnaire. Other sections such as SECTION 2: HOUSING and SECTION 12: FOOD EXPENSES AND HOME PRODUCTION , which collected data at the aggregate household level, were included in either the male questionnaire or the female questionnaire, depending upon which member of the household was more likely to know more about that particular area of household activity. Male and female interviewers were instructed to switch questionnaires where necessary in order to obtain information from the best informed individual in the household.
Information for all male members aged 10 years or more was collected using the male questionnaire. Iinformation on other household members (i.e. all female household members as well as children aged less than 10 years) was collected using the female questionnaire. Individuals covered in the male questionnaire were assigned sequential ID codes beginning with code "01" and those household members covered in the female questionnaire were assigned ID codes starting with code "51".
It is important to note, however, that the division of the questionnaire into the male and female portions was undertaken solely to facilitate gathering of data in the field. Male and female enumerators could interview respondents of different sexes separately when visiting each household, and thus obtain information pertaining to household members of both sexes directly from the individuals concerned. This was particularly important in the case of sections such as SECTION 13: MARRIAGE AND MATERNITY HISTORY, where assigning female enumerators to directly interview the women concerned was crucial. While information for male and female members was collected in separate questionnaires, these data were combined during data entry so that the household data files contain information on all members of the household. Each section of the household questionnaire was further divided into subsections A, B, C, etc.
COMMUNITY AND PRICE QUESTIONNAIRES:
In each of the 300 communities where household interviews were conducted for the PIHS, a community questionnaire was administered by the team supervisor. Respondents to this questionnaire typically consisted of the head of the village or community, the local school master, local government official, or any other such individual who was knowledgeable about the community. Communities were defined as all households living in the Primary Sampling Unit (PSU) in which the interview was conducted (the concept of PSU is explained in more detail in the next section on Sample Design). While each of the 300 PSUs consisted of roughly the same number of households (generally about 200 - 300), the area covered by individual PSUs varied considerably. In urban areas, communities were, in general, much smaller in terms of area covered, and were defined to be the group of households living within the physical boundaries of the PSU. In rural areas, because of the low population density, the PSU at times consisted of a group of settlements spread over a large area. In such cases, the supervisors were instructed to treat the largest or most central village in the PSU as the community.
The community questionnaire contained questions on characteristics of the community such as the quality of physical infrastructure, provision of amenities such as electricity, gas and water, access to education and health care facilities, and on markets and availability of goods and services in the locality. In order to obtain more information on birth practices used in the community, one of the sections of the community questionnaire was directed at dais (birth attendants) in the community and contained a number of questions on birth practices and preand post-birth maternal care. In rural areas, in addition to the section on the general characteristics of the community, two additional sections on health facilities and primary school facilities were also administered. Detailed information was collected on the quality of infrastructure, the equipment and services available, as well as staffing of these facilities.
Finally, a price questionnaire was also administered in all the communities where households were interviewed. Price information for 37 goods was collected. The goods included items such as food staples, tea and sugar, selected vegetables, as well as a few non-food items like fuels, soaps, etc. For all goods, two sets of prices were collected: one from the local shopkeeper and the other from the local mandi or wholesale seller. In rural areas, prices of agricultural inputs as well as other relevant information on local farming practices was also collected.
World Bank LSMS
Use of the dataset must be acknowledged by including a citation which would include:
- Identification of the Primary Investigator(s) and of the country
- Title of the survey (including the year of implementation)
- Survey reference number
- Source and date of download
Pakistan. Federal Bureau of Statistics and the World Bank. Pakistan Integrated Household Survey (PIHS) 1991. Ref. PAK_1991_PIHS_v01_M. Dataset downloaded from www.microdata.worldbank.org on [date]
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.