Cambodia Socio-Economic Survey (CSES) 1997 is the first large scale multi-objective household survey conducted in Cambodia.
The primary objective of Cambodia Socio-Economic Survey (CSES) 1997 was to obtain data for the measurement of living standards in geographic stratification and different segments of the Cambodian society. The other objectives were to provide information needed by a variety of users such as government institutions, donor agencies, non-government organizations; to assist NIS to train its staff in planning, designing and conducting a household based survey system and institutionalize survey taking capability. The expansion of the scope of the survey to meet the data needs of a wide variety of users and thus minimize the duplication of household surveys and promote the acceptance of CSES as the national household survey programme was also an important objective.
Specifically the survey had the following objectives:
i) To provide data required for the measurement of living standards through a single source of data for a comprehensive and detailed analysis of living standards and poverty in Cambodia.
ii) To provide information on school facilities, schooling and enrollments, cost of education and related information.
iii) To provides information on health issues, utilization of health facilities and costs incurred in treating illnesses.
iv) To provide information on demographic and economic characteristics of the population such as age-sex distribution, marital status, fertility, mortality, literacy, employment incomes.
v) To derive information on socio-economic conditions of villages including infrastructure and access to education and health facilities.
vi) To establish survey taking capability within NIS for the Institute to conduct multi-objective large scale household-based survey programmes.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
The 1997 Cambodia Socio-Economic Survey covered the following topics:
- Demographic Information
- Economy and Infrastructure
- Retail Prices and Wages
- Natural Disasters
- Household Demographic Information
- Education and Schooling
- Economic Activity
- Housing and Environment
- Household Assets and Liabilities
- Fertility and Child Care
Producers and sponsors
National Institute of Statistics
Ministry of Planning
Project Executing Agency
United Nations Development Programme
Swedish International Development Cooperation Agency
A sampling frame based on a national population census is not available for Cambodia. A list of village and village population prepared by the United Nations Transitional Authority in Cambodia (UNTAC) for conducting the general elections held in 1993, which was updated for undertaking the household surveys conducted during the past few years was used as the sampling frame.
As in other household surveys conducted recently, the coverage of the survey had to be restricted due to security reasons; excluded areas were considered not sage for the enumerators to conduct fieldwork. Accordingly two provinces and a number of communes from 15 other provinces were excluded from the frame. The truncated frame used for the survey covered 100% of the villages in Phnom Penh, 91.2% of villages in the Other Urban towns and 86.3% of the Rural villages. The proportions of excluded households were lower, and amounted to only 4.8% of households in other urban areas and 11.6% of households in the rural sector.
A two-stage stratified random sampling design was adopted with villages as primary sampling units (PSU's) and households as secondary sampling units (SSU's). Considering the socio-economic stratification, the spread of the items canvassed, and sample size of the survey, Cambodia was divided into 3 strata viz. Phnom Penh, Other Urban areas and Rural areas. The frame which had villages grouped by communes, and communes by districts and provinces in effect, provided for an implicit stratification of the universe for the probability proportional to size (PPS) systematic random sampling procedure adopted in the selection of the PSU's. The procedure also provided for the preparation of estimates for the four geographic zones namely, Plains, Tonle Sap Lake, Plateau and Mountains and Coastal regions of the country.
Deviations from the Sample Design
The sampling design for the CSES 1997 considered several factors including the precision of data required by the users, the capacity of the national statistics office to conduct the survey, and most importantly the time constraint imposed to complete survey field work before the end of July 1997. Taking into account these factors, and specially the experience gained from the two socio-economic surveys conducted in 1993/94 and 1996, including estimates of feasible work loads, a sample of 6000 households to be selected from 474 villages was considered to be sufficient and manageable.
The design also took into consideration the need for separate analyses of three geographical domains, namely Phnom Penh, other urban areas aggregated together, and the rural area. In deciding the sample allocation to the three domains, it was decided that a size of around 1000 households would be adequate for the first two domains and the rest should be allocated to Domain 3 - Rural area, since it was envisaged that more detailed analysis of the poverty groups in this domain would be undertaken.
Despite the length of the questionnaire, the respondents had cooperated with the survey staff and provided answers to both questionnaires and it was possible to achieve a 100% response rate. At this stage it is not possible to comment on item non-response, and completeness of information provided by the respondents, and the respondent’s fatigue arising from the length of the interviews which may have had a bearing on these issues.
The estimates have been formed by weighting the data from the sample households to provide estimates that relate to all households in each domain. The weighting factors were calculated based on the probabilities of selection for the sample
The design weights are used to compensate for differences in the selection probabilities. The weight for the PSU is inversely proportional to its selection probability.
The probability of selection of j th household in normal size PSU's and blocks in the h th domain is
ph( i ) x ph( j / i ) = ph( ij ) ( Eq. 3 )
where ph( i ) = ah Mhi / Mh
and ph( j / i ) = nh / Mhi*
Thus the design weights whij for these units are
whij = 1 / ph( ij )
Mh x Mhi*
= ----------------------- ( Eq. 4 )
ah x Mhi x nh
For the large PSU's which were segmented, the probability of selection of the jth household in the sth segment in the ith PSU in the hth domain is
ph( i ) x ph( s / i ) x ph( j / is ) = ph( isj ) ( Eq. 5 )
where ph( i ) = ah Mhi / Mh
ph( s / i ) = 1 / si
and ph( j / is ) = nh / Mhis* ( Eq. 6 )
The design weight for such large PSU is
whisj = 1 / ph( isj )
Mh x Mhis* x si
= ------------------------------ (Eq. 7 )
ah x Mhi x nh
The design for CSES is not self weighting and therefore it is necessary to compute weight for each PSU, block or segment selected in the sample and these weights have to be used in the estimation procedure.
Dates of Data Collection
Data Collection Mode
The supervisor is responsible for
(i) administering the Village Questionnaires (Form 2),
(ii) preparing the two Household Questionnaires for each village (for example, completing certain information on the Cover Page of each questionnaire, as described in this manual),
(iii) checking all completed questionnaires to ensure that they have been filled up completely and well, and
(iv) for making random visits to households that have been interviewed by interviewer to make sure that the answers are consistent with the completed questionnaire.
(v) The supervisor is also expected to occasionally observe interviewers while they are conducting household interviews, especially during the first one or two weeks of the field work.
The district-level supervisor is responsible for checking the village questionnaires and for monitoring the survey's overall progress in those villages.
Data Collection Notes
Field enumerators and supervisors were drawn from the National Institute of Statistics, the Ministry of Planning and provincial planning and statistics offices. In all 211 staff comprising 156 enumerators and 55 supervisory staff were trained in Phnom Penh between 19 May and 03 June 1997 by the Project staff supported by senior core group staff from NIS assigned to the Project. A comprehensive manual was prepared for the use and guidance of the field staff and as training material during training.
Data collection was carried out between the last week of May 1997 and the end of June 1997. Village leaders and other key informants were interviewed to collect village level data. The household information was collected through personal interviews with one or more responsible members of sample households. The completed questionnaires were edited in the field by the field supervisors. The pre-survey publicity through the media, and the arrangements made for survey field operations made it possible to achieve 100% responses from the sampled villages and sampled households.
Each interviewer was assigned selected villages based on the sampling procedure. In order to complete the data collection activity within the planned time frame, each enumerator was assigned about 30/ 45 households in three or four villages. The questionnaires were filled by the method of personal interview.
A pre-listing of households was undertaken by the enumerator to generate the current list of households, which was essential to select the sample households based on the systematic sampling procedure. In addition to preparing a current list of buildings, housing units and households certain additional information such as the number of household members, principal economic activity of the household was also collected.
After the selection of sample households, the selected households were revisited to interview one or more responsible members of the household to fill in the core and social sector questionnaires. Before or after the household interviews, the enumerator interviewed the head of the village and other key informants to canvass information for the village questionnaire.
The field control procedures provided for the supervisors to inspect and make on the spot checks while the interview was being conducted and they were also required to re-interview a sub-sample of the households already interviewed by the enumerators under his supervision. To ensure effective supervision through inspections and re interviews, adequate funds were allocated for the payment of honoraria to supervisors for their supervisory duties. Some of the core group staff functioned as area coordinators and they were in over all charge of supervision as well as the coordination of the areas assigned to them. There was also a visit of the Minister of Planning and the Under Secretary of State MOP, Project Staff and Senior NIS Staff in Mid June 1997 to encourage the field staff and to study the operational issues and problems encountered in field work.
Despite the length of the questionnaire, the respondents cooperated with the survey staff and provided answers to both questionnaires and it was possible to achieve a 100% response rate. At this stage it is not possible to comment on item non-response, and completeness of information provided by the respondents, and the respondent's fatigue arising from the length of the interviews which may have had a bearing on these issues.
Four questionnaires were used in the survey.
Form 1: Household Listing Form was used to prepare the current list of households for sampling.
Form 2: Village Questionnaire was used to collect village level data on socio-economic infrastructure and facilities including prices and wages from key informants.
Form 3: Core Questionnaire was used to collect demographic and socio-economic characteristics of the population.
Form 4: Social Sector Module was designed to collect detailed information on education and health service utilization and related household expenditures.
All completed questionnaires were brought to NIS for processing. Although completed questionnaires were checked and edited by supervisors in the field, specially because of the length of questionnaires and the complexity of the topics covered the need for manual editing and coding by trained staff was accepted as an essential priority activity to produce a cleaned data file without delay. In all, 39 staff comprising 35 processing staff and 4 supervisors were trained for three days by the project staff. An instruction manual for manual editing and coding was prepared and translated into Khmer for the guidance of processing staff. Manual processing of questionnaires commenced in mid August 1997.
In order to produce an unedited data file, keying in the data as recorded by field enumerators and supervisors, (without subjecting data to manual edit as required by the Analysis Component Project staff), it was necessary to structure manual editing as a two-phase operation. Thus in the first phase, the processing staff coded the questions such as those on migration, industry, and occupation which required coding. Editing was restricted to selected structural edits and some error corrections. These edits were restricted to checking the completeness and consistency of responses, legibility, and totaling of selected questions. Error corrections were made without canceling or obliterating the original entry made by the enumerator, by inserting the correction close to the original entry.
Much of the manual editing was carried out in the second phase, after key entry and one hundred percent verification and extraction of error print outs. A wide range of errors had to be corrected which was expected in view of the complexity of the survey and the skill background of the enumeration and processing staff. The manual edits involved the correction of errors arising from incorrect key entry, in-correct/ failure to include identification, miss-coding of answers, failure to follow skip patterns, misinterpretation of measures, range errors, and other consistency errors.
An in-house survey processing centre was established at the NIS to process the CSES 1997. A net work of 12 PCs with 2 high capacity PCs as servers was installed and NIS staff were trained to use the network system. The network can be strengthened with additional workstations to process a survey sampling of 15,000 households referred to in the project document.
Entire data processing was done on microcomputers and data entry and editing was carried out using Integrated Micro-Computer Processing System(IMPS) package developed by the US Bureau of the Census. Statistical Package for Social Sciences (SPSS) was used to obtain tabulations.
At the end of August 1997, the keyers and verifiers were trained for three days and key entry operations commenced. In all 30 key entry and verification staff and 3 supervisors were trained by the Data Processing Specialist to use the data entry screens prepared using IMPS software.
Four data entry systems were created to input the data from the four questionnaires. The data entry system for the listing form contains one record type with a maximum length of 49. The system for the village questionnaire contains15 record types with a maximum record length of 105. The system designed for the core questionnaire contains 17 record types with a maximum record length of 116. The data entry system designed for the social sector module contains12 record types with a maximum record length of 94. After keying in the data one hundred percent verification was done on all card types. In spite of this safeguard to minimize errors it was found that verifiers had not only failed to detect errors but had introduced errors during verification. The set of consistency edit checks prepared for the survey when applied for a sample of three villages, the error printouts were so voluminous that it was decided to clean the files in stages, selecting a single record, question or a topic at a time. The first computer edit was applied to check the basic structure of the data and to check the skipping patterns. The errors were corrected manually and the data file was updated using IMPS programs. After completing the structural edit, the data file was re-edited for validity of records. Consistency edits were designed to detect responses that appeared to be inconsistent with other responses or in conflict with definitions and processing rules. It was necessary to run several edit checks to clean some data items. For tabulation several sub-master files were created for most data items. The inflation factors that should be assigned to each village were applied to the data at the tabulation stage.
Estimates of Sampling Error
The results obtained from the survey are subject to sampling errors. Sampling errors in surveys occur as a result of limiting the survey observations to a subset rather than the whole population. These errors are related to the sample size selected and sampling design adopted in the survey. In order to maintain these errors within acceptable levels, the efficient sampling design with the sample allocation described earlier was adopted.
In addition to sampling errors, the estimates are also subject to non-sampling errors that arise in different stages of any survey operation. These include
- errors that are introduced at the preparatory stage
- errors committed during data collection including those committed by interviewers and respondents
- processing errors
The first item includes errors arising from questionnaire design, preparation of definitions and instructions, preparation of table formats etc. The other two categories are clear from the terminology used. The use of trained enumerators and processing staff and careful organization and thorough supervision are essential to control and minimize these errors.
As already referred to, it was possible to obtain responses from all the villages and
households that were sampled, and thus it was not necessary to adjust the data for non-response. Thus the bias that is introduced into the estimates as a result of non-response was avoided.
The standard error of a survey estimate provides a measure of how far the survey estimate is likely to vary from the true population value(i.e. parameter ) as a result of having collected the data on a sample basis rather through a complete census. The standard error se(r) of a survey estimate is by definition
se( r ) = var( r )^1/2
The relative standard error or coefficient of variation ( cv ), on the other hand provides a measure of the relative variance of a survey estimate; that is the magnitude of the estimated sampling error relative to the magnitude of the estimate itself. The cv that is expressed as a proportional error enables the data user to compare the relative reliability or precision with which different types of survey characteristics have been measured eg. Means versus proportions, where direct comparisons of standard errors are uninformative since the magnitude of the standard error is dependent upon the magnitude of the estimate
The results provide estimates at the level of the three domains Phnom Penh, other urban areas, and the rural sector into which the entire geographical area covered by the survey was divided. The survey design has provided for statistically reliable estimates for most characteristics at these levels of stratification.
The expenditure data from CSES 1997 presented here are not strictly comparable with the data from the SESC 1993/94, which canvassed very detailed data on consumer expenditure. SESC 1993/94 collected data on over 450 items of consumption expenditure, the type of information required to establish weights in the construction of consumer price indices. At that level of disagregation it is possible to achieve results closer to actual consumption levels. Such surveys are required infrequently once in 5 –7 years because of costs and time involved in designing, conducting and processing such surveys. CSES 1997 had used a shorter list comprising 33 commonly used consumer items that were considered to be adequate to monitor consumption expenditure over time. In addition to this issue arising from differences in the scope of the two surveys, the researchers should take note of the decline in household size and changes in household structure which are important determinants of household expenditure.
National Institute of Statistics
National Institute of Statistics
Ministry of Planning
The Statistics Law Article 22 specifies matters of confidentiality. It explicitly says that all staff working with statistics within the Government of Cambodia "shall ensure confidentiality of all individual information obtained from respondents, except under special circumstances with the consent of the Minister of Planning. The information collected under this Law is to be used only for statistical purposes."
Each dataset has an "Access policy". The NIS recommends three levels of accessibility:
- Public use files, accessible to all
- Licensed datasets, accessible under conditions
- Datasets only accessible in a data enclave, for the most sensitive and confidential data.
1. The data and other materials will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of the National Institute of Statistics.
2. The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organizations.
3. No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery would immediately be reported to the National Institute of Statistics.
4. No attempt will be made to produce links among datasets provided by the National Institute of Statistics, or among data from the National Institute of Statistics and other datasets that could identify individuals or organizations.
5. Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from the National Institute of Statistics will cite the source of data in accordance with the Citation Requirement provided with each dataset.
6. An electronic copy of all reports and publications based on the requested data will be sent to the National Institute of Statistics.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download of the data files (for datasets obtained on-line)
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
DDI Document ID
Development Economics Data Group
Documentation of the DDI
Date of Metadata Production
DDI Document version
Version 01 (June 2011) - Adopted from "KHM-NIS-CSES-1997-v1" DDI. Source http://www.nis.gov.kh/nada/?page=catalog