Sampling Procedure
Migration data for South Africa are available for 2007 only at the level of local governments or municipalities from the 2007 Census; for smaller areas called "sub places" (SPs) only as recently as the 2001 census, and for the desired EAs only back so far as the Census of 1996. In sum, there was no single source that provided recent data on the five types of migrants of principal interest at the level of the Enumeration Area, which was the area for which data were needed to draw the sample since it was going to be necessary to identify migrant and non-migrant households in the sample areas in order to oversample those with migrants for interview.
In an attempt to overcome the data limitations referred to above, it was necessary to adopt a novel approach to the design of the sample for the World Bank's household migration survey in South Africa, to identify EAs with a high probability of finding immigrants and those with a low probability. This required the combined use of the three sources of data described above.
The starting point was the CS 2007 survey, which provided data on migration at a local government level, classifying each local government cluster in terms of migration level, taking into account the types of migrants identified. The researchers then spatially zoomed in from these clusters to the so-called sub-places (SPs) from the 2001 Census to classifying SP clusters by migration level. Finally, the 1996 Census data were used to zoom in even further down to the EA level, using the 1996 census data on migration levels of various typed, to identify the final level of clusters for the survey, namely the spatially small EAs (each typically containing about 200 households, and hence amenable to the listing operation in the field).
A higher score or weight was attached to the 2007 Community Survey municipality-level (MN) data than to the Census 2001 sub-place (SP) data, which in turn was given a greater weight than the 1996 enumerator area (EA) data. The latter was derived exclusively from the Census 1996 EA data, but has then been reallocated to the 2001 EAs proportional to geographical size. Although these weights are purely arbitrary since it was composed from different sources, they give an indication of the relevant importance attached to the different migrant categories. These weighted migrant proportions (secondary strata), therefore constituted the second level of clusters for sampling purposes.
In addition, a system of weighting or scoring the different persons by migrant type was applied to ensure that the likelihood of finding migrants would be optimised. As part of this procedure, recent migrants (who had migrated in the preceding five years) received a higher score than lifetime migrants (who had not migrated during the preceding five years). Similarly, a higher score was attached to international immigrants (both recent and lifetime, who had come to SA from abroad) than to internal migrants (who had only moved within SA's borders). A greater weight also applied to inter-provincial (internal) than to intra-provincial migrants (who only moved within the same South African province).
How the three data sources were combined to provide overall scores for EA can be briefly described. First, in each of the two provinces, all local government units were given migration scores according to the numbers or relative proportions of the population classified in the various categories of migrants (with non-migrants given a score of 1.0. Migrants were assigned higher scores according to their priority, with international migrants given higher scores than internal migrants and recent migrants higher scores than lifetime migrants. Then within the local governments, sub-places were assigned scores assigned on the basis of inter vs. intra-provincial migrants using the 2001 census data. Each SP area in a local government was thus assigned a value which was the product of its local government score (the same for all SPs in the local government) and its own SP score. The third and final stage was to develop relative migration scores for all the EAs from the 1996 census by similarly weighting the proportions of migrants (and non-migrants, assigned always 1.0) of each type. The the final migration score for an EA is the product of its own EA score from 1996, the SP score of which it is a part (assigned to all the EAs within the SP), and the local government score from the 2007 survey.
Based on all the above principles the set of weights or scores was developed.
In sum, we multiplied the proportion of populations of each migrant type, or their incidence, by the appropriate final corresponding EA scores for persons of each type in the EA (based on multiplying the three weights together), to obtain the overall score for each EA. This takes into account the distribution of persons in the EA according to migration status in 1996, the SP score of the EA in 2001, and the local government score (in which the EA is located) from 2007. Finally, all EAs in each province were then classified into quartiles, prior to sampling from the quartiles.
From the EAs so classified, the sampling took the form of selecting EAs, i.e., primary sampling units (PSUs, which in this case are also Ultimate Sampling Units, since this is a single stage sample), according to their classification into quartiles. The proportions selected from each quartile are based on the range of EA-level scores which are assumed to reflect weighted probabilities of finding desired migrants in each EA. To enhance the likelihood of finding migrants, much higher proportions of EAs were selected into the sample from the quartiles with the higher scores compared to the lower scores (disproportionate sampling). The decision on the most appropriate categorisations was informed by the observed migration levels in the two provinces of the study area during 2007, 2001 and 1996, analysed at the lowest spatial level for which migration data was available in each case.
Because of the differences in their characteristics it was decided that the provinces of Gauteng and Limpopo should each be regarded as an explicit stratum for sampling purposes. These two provinces therefore represented the primary explicit strata. It was decided to select an equal number of EAs from these two primary strata.
The migration-level categories referred to above were treated as secondary explicit strata to ensure optimal coverage of each in the sample. The distribution of migration levels was then used to draw EAs in such a way that greater preference could be given to areas with higher proportions of migrants in general, but especially immigrants (note the relative scores assigned to each type of person above). The proportion of EAs selected into the sample from the quartiles draws upon the relative mean weighted migrant scores (referred to as proportions) found below the table, but this is a coincidence and not necessary, as any disproportionate sampling of EAs from the quartiles could be done, since it would be rectified in the weighting at the end for the analysis.
The resultant proportions of migrants then led to the following proportional allocation of sampled EAs (Quartile 1: 5 per cent (instead of 25% as in an equal distribution), Quartile 2: 15 per cent (instead of 25%), Quartile 3: 30 per cent (instead of 25%), and Quartile 4: 50 per cent (instead of 25%).
It was agreed that a sample size of at least 2 000 households would be required to elicit the required information. It was agreed further that only six (6) households would be selected in the final level of clusters, i.e., the PSUs or the EAs, to reduce clustering effects, viz., the possible impact of spatial interdependence of survey responses. This gave a required total of 334 EAs (2 000 / 6 = 333.33) to be selected.
An explicit, disproportional stratification of provinces (primary strata) and incidence migrant proportions (secondary strata) was therefore used as a basis for the selection of EAs. The disproportionate distribution of these selected EAs was to be rectified afterwards through the use of EA weights during all data analyses.
Within each sample EA selected following the procedures above, an approximate listing of dwellings was undertaken by the survey team, and updated maps (showing streets/roads, potentially eligible dwellings, and other easily identifiable features for orientation purposes) were produced.
When there were more than one household at a particular visiting point, only one was randomly selected. In the case of a block of flats, townhouse complex or retirement village, it was important to regard every occupied flat/unit as a potential visiting point of the interval. In the case of single-sex workers' hostels, each room or dormitory constituted a visiting point and every occupied bed in a selected room/dormitory represented a (single-person) household.
The sampling process was according to the following plan.
· Enumerator Areas were randomly selected using the approach outlined earlier
· Maps of the selected EAs were obtained the from Statistics South Africa (STATS SA),
· For each EA, the fieldwork supervisor/team identified the physical boundaries from the map and ensured that the map and the physical location were congruent,
· The fieldwork supervisor/team counted the number of houses/dwellings within each EA. Call this Nile,
· 20 households per EA were to be visited, so the sampling interval was calculated as Nile/20. For example, if Nile=200 houses/stands, the sampling interval was Nile=200/20=10. This means that every 10th house/stand was visited,
· The supervisor identified a random starting point, such as a school, a shop, a library, or some similar public point. If none could be identified then one dwelling was identified,
· From this randomly selected starting point, every 10th house/dwelling was visited.
· The actual household interviewed was selected following the procedure below:
o interviewer approached the first household (call that Household #1) and completed the interview irrespective of whether there are migrants in the household,
o Households #2 to #15 were interviewed only if there was at least one international migrant in the household,
o Households #16 to #20 were interviewed irrespective of whether there were migrants in the household,
o If there were migrants in the first six households visited, the interviewer stopped and did not visit any more of Households. The other households were just noted,
o This meant that at the onset for each EA, 20 households were targeted for interview, but a maximum of six would be interviewed,
o If the dwelling unit replacements were required, e.g., if some households refused to be interviewed, then interviewers were to select the next house to the right, followed if necessary by the next house to the left, and so on.
In addition, fieldworkers also had to fill-in a recording sheet. The purpose of the recording sheet was to make sure that fieldwork teams recorded all the household they visited, recording addresses as well as the status of the household, i.e. whether the household had an international migrant or not.