Russia Longitudinal Monitoring Survey - Higher School of Economics 1995
Other Household Survey [hh/oth]
The Russia Longitudinal Monitoring Survey (RLMS) is a series of nationally representative surveys designed to monitor the effects of Russian reforms on the health and economic welfare of households and individuals in the Russian Federation. These effects are measured by a variety of means: detailed monitoring of individuals' health status and dietary intake, precise measurement of household-level expenditures and service utilization, and collection of relevant community-level data, including region-specific prices and community infrastructure data.
Data for RLMS have been collected since 1992. Since 1994, the team has collected a new round of data almost every year in the second phase of the project.
The Russia Longitudinal Monitoring Survey (RLMS) is a household-based survey designed to measure the effects of Russian reforms on the economic well-being of households and individuals. In particular, determining the impact of reforms on household consumption and individual health is essential, as most of the subsidies provided to protect food production and health care have been or will be reduced, eliminated, or at least dramatically changed. These effects are measured by a variety of means: detailed monitoring of individuals' health status and dietary intake, precise measurement of household-level expenditures and service utilization, and collection of relevant community-level data, including region-specific prices and community infrastructure data. Data have been collected since 1992.
As its name implies, the RLMS is a longitudinal study of populations of dwelling units. Rounds V-VII are designed to provide a repeated cross-section sampling. Barring the construction of major new housing structures, renewed contact with a fixed national probability sample of dwelling units provides high coverage cross-sectional representation. The repeat visit at each round to a static sample of dwelling units also introduces a correlation between successive samples that leads to improved efficiency in longitudinal analyses comparing aggregate statistics.
The repeated cross-section design is far and away the simplest alternative for the RLMS. The sampling is cost efficient, easy to maintain, and easy to update when needed. The design supports both efficient cross-sectional and aggregate longitudinal analyses of change in the Russian household population. Updates to the sample, including a full replenishment of the probability sample of dwelling units, will not seriously disrupt the longitudinal data series.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
Households and individuals.
The scope of the study includes:
- Use of time;
- Health status;
- Medical services;
- Child care;
- Family information;
- Housing conditions;
- Living conditions;
- Transportation and related information;
- Local municipal and other services;
- Cost of food products;
- Farming and animal husbandry;
Producers and sponsors
National Research University Higher School of Economics
Carolina Population Center
University of North Carolina at Chapel Hill
Institute of Sociology RAS
National Research University Higher School of Economics
US National Institutes of Health
The goal was to develop a sample of households (excluding institutionalized people) that would meet accepted scientific standards of a true probability sample to the greatest extent possible, while taking into account the severe operational constraints of Goskomstat. With the advice of William Kalsbeek [a sampling expert at the University of North Carolina at Chapel Hill (UNC-CH)] and later with help from Leslie Kish, the project developed a replicated three-stratified cluster sample of residential addresses, excluding military, penal, and other institutionalized populations. Replication was designated for Stage 1 of sampling so that the number of primary sampling units (PSUs) could be kept manageable, with the understanding that later they would be expanded. The sample size of each replicate was set at 20 PSUs. The quality of this sample was statistically analyzed.
Sample attrition due to nonresponse cannot be avoided. Table 1 summarizes RLMS Round V interview completion rates for the original sample of dwelling units in the eight regions that comprise the survey population. These are not response rates; each denominator includes dwelling units that were vacant or uninhabitable at the time of the Round V interviews. Overall, interviews were completed in 84.3% of the original national probability sample of n=4718 dwelling units.
Interview completion rates outside St. Petersburg, Moscow City, and Moscow Oblast range from 84.8% in the combined Central/Central Black Earth region to 92.6% in Western Siberia. Rates in the highly urban Moscow/St. Petersburg region are much lower. In part, these rates may reflect higher vacancy rates in metropolitan areas, but clearly lower household contact and response rates also come into play. Lower rates in Moscow and St. Petersburg were anticipated at the design stage, and initial allocations to these strata were increased to offset expected losses from refusal and noncontact. This is one form of what we might call "designing for nonresponse." The over-sampling strategy is beneficial in that it means reduced variability in the final analysis weights (due to the offset in the product of higher sample selection probability and lower response propensity); however, over-sampling eliminates the potential for bias only if attrition is occurring at random within the final weighting adjustment cells.
If independent samples were developed for each round of the repeated cross-section design, attrition in one round would be independent of (although possibly similar in nature to) that in other rounds. However, since the RLMS uses a static sample of dwellings across multiple rounds, the impact of nonresponse and attrition is the net effect of several factors. Round V attrition bias can arise only from differential nonresponse and noncontact for subclasses of households that occupy the original sample of dwelling units. The potential for nonresponse bias in cross-sectional analysis or contrasts involving the Rounds VI and VII data is a complex function of:
(1) initial nonresponse in Round V;
(2) net difference in characteristics of households and individuals who move out of or into sample dwellings;
(3) nonresponse on the part of old households continuing to reside in sample dwelling units; and
(4) nonresponse on the part of new households currently living in sample dwelling units.
Time did not permit analysis of each of these factors. Instead, I performed several simple analyses of the net effect of household turnover and nonresponse on the marginal sample distributions (unweighted) of population characteristics that should not change significantly over time.
The general observation is that the combined influence of nonresponse attrition and household turnover does not seriously distort the geographic distribution of the sample or its size or household-head characteristics. The distributions for the geographic variables indicate that, between Round V and Round VII, there is a decline in the nominal representation of households in the Moscow/St. Petersburg region, reflected in a decline in the proportion of sample households from the urban domain. Households with a male head aged 18-59 may be subject to slightly higher than average attrition/net loss in replacement. If we focus only on these characteristics, the problem is not serious.
In summary, the net effect of nonresponse attrition and change in dwelling unit occupants across rounds on the marginal characteristics of the observed cross-sectional samples is modest. Loss in nominal "sample share" between Rounds V and VII is greatest for residents of Moscow/St. Petersburg--a loss in representation that is readily corrected with the combined sample selection/nonresponse adjustment factors that have been computed for each round. It is important to note that the simple analysis described here cannot demonstrate that no uncorrected attrition bias remains. The potential for uncorrected nonresponse bias can be specific to the dependent variable under study. Nevertheless, it appears that, with the nonresponse and post-stratification adjustments developed by Michael Swafford, the potential for serious attrition bias in repeated cross-section analysis is small.
Weights in Descriptive Analysis of RLMS Data.
Analysis weights are essential for unbiased sample-based estimation of RLMS descriptive statistics such as population and subclass means, proportions, and totals. The construction of a descriptive weight for cross-sectional analysis involves a simple sequence of steps:
(1) determine the probability of selection for each sample household;
(2) based on geographic and other known characteristics of sample households, compute an adjustment for nonresponding sample households;
(3) compute a nonresponse-adjusted weight as the product of the reciprocal of the sample selection probability and the nonresponse adjustment.
Since the RLMS attempts to interview all individuals within sample households, the selection probability for an individual equals that for his household. An individual in a cooperating household may, however, choose not to give an interview. If data on individuals-- both cooperating and not--are known from household listings, the nonresponse adjustment factor in the analysis weight can be computed at the level of the individual. Fortunately, the majority of RLMS nonresponse at the individual level corresponds to noncooperation by the entire household, and the household nonresponse adjustment factor will capture most of the sample attrition loss at both levels.
If recent census data on households and individuals are available, a fourth post-stratification step can be added: scaling analysis weights so that the sum of weights for a defined subpopulation matches the corresponding census proportion (e.g., the weighted sample proportion of females, age 45 and older, in the Moscow/St. Petersburg region matches the corresponding proportion from the most recent census). The post-stratification of analysis weights serves two functions:
(1) it can reduce the sampling variance of weighted estimates; more importantly
(2) it may correct noncoverage biases in the frame used to derive the original sample of dwellings and individuals.
RLMS data sets contain post-stratification weights - weights that adjust not only for design factors but also for deviations from the census characteristics. For households, we have produced post-stratification weights that fit our data to the known distribution of household size and location of residence (urban or rural). For individuals, our weights fit our data to the multivariate distribution of location, age, and gender. Of course, depending on the subject of one's analysis, it might be appropriate to compute post-stratification weights that adjust to other variables, and all analysts are free to compute their own.
There is considerable debate over the value of using weights in multivariate analysis. For example, in estimating linear or generalized linear models, many software programs allow the specification of weights for model fitting. Some statisticians argue that using weights is not necessary if the fixed effects that explain the variation in weights are included in the model. In RLMS data, the household characteristics that explain the greatest variation in weights are the geographic region and the urban/rural character of the civil division in which the dwelling is located. Variation in individual weights will reflect the geographic effects for households as well as differentials due to post-stratification of the sample by major geographic regions, age, and sex. Researchers who are interested in exploring the impact of RLMS weights on a multivariate analysis should consider the following test. Fit the model omitting the weights but including as fixed effects the household (region, urban/rural) or individual (region, urban/rural, age, and sex) characteristics. Without changing the specification, also estimate the model using the analysis weights. Compare the results to see if there are important differences in model parameters and/or interpretation. Differences in the unweighted and weighted versions could be due to added sampling variability introduced by the weighted estimation or could indicate that the model is not correctly specified.
At each round, the data contain some households with a sampling weight of zero (0). This values was assigned to the households who moved out of the sample area between rounds. They were located and interviewed to provide a group of respondents for longitudinal analyses. They were assigned weights of zero to keep analysts from inadvertently including them in cross-sectional analyses that are intended to be representative of Russia. (Only respondents with non-zero weights are part of the representative sample.)
Dates of Data Collection
Data Collection Mode
The questionnaire are English-language translations of the original Russian questionnaires. The English versions have been translated as literally as possible. The order of the questions and the layout of the pages have been preserved in the English versions.
The questionnaires are also designed to function as codebooks. The variable names, as they appear in the data sets, are usually listed below or to the left of the questions. If the abbreviation (char) appears with a variable name, then the responses to that question are stored in a character variable. If there is no variable name associated with a particular question, then the responses to that question do not appear in the data set. Some questions in the questionnaires are color coded. Pink means that the question was added. Green indicates changes from the previous round (e.g., year). Gray means that the questions were asked, but the data are not available for public use - the questions were added at the request of the Pension Office and are for their use only.
In Phase II (Rounds V - XX), when questionnaires were returned to local supervisors, those supervisors were required to examine them to locate problems that could best be remedied in the field, e.g., by returning to get key demographic information or cleaning ID numbers so that the roster of individuals located in the household questionnaire matched those on the individual questionnaires from that household. The questionnaires were then transported to Moscow, where yet another ID check was performed.
In Moscow, coders looked through all questionnaires to code so-called "other: specify" responses. However, open-ended questions (e.g., occupation questions) were not coded at this time. Instead, their texts were fully entered as long string variables. Entering the open-ended answers as character variables offered several advantages. First, it allowed data entry to begin immediately, with no delay for coding. Second, it permited the use of computer programs to assist in coding the string variables. Third, the method allowed any user of the original data sets to recode the character variables to suit his or her purposes without going back to the paper copies of the questionnaires.
All data entry was handled in-house using the SPSS data entry program on PCs.
Source: "Russia Longitudinal Monitoring survey, RLMS-HSE", conducted by Higher School of Economics and ZAO "Demoscope" together with Carolina Population Center, University of North Carolina at Chapel Hill and the Institute of Sociology RAS.
RLMS-HSE sites: http://www.cpc.unc.edu/projects/rlms-hse, http://www.hse.ru/org/hse/rlms.
Location of Data Collection
Carolina Population Center, the University of North Carolina at Chapel Hill
Archive where study is originally stored
Carolina Population Center, the University of North Carolina at Chapel Hill
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.