Southern Africa Consortium for Monitoring Educational Quality 2007
Socio-Economic/Monitoring Survey [hh/sems]
The origins of the Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ) date back to 1991, the year when several Ministries of Education in Eastern and Southern Africa started working closely with UNESCO's International Institute for Educational Planning (IIEP) on the implementation of integrated educational policy research and training programmes.
In 1995 these Ministries of Education formalized their collaboration by establishing a network that is widely known as SACMEQ. Fifteen Ministries are now members of SACMEQ: Botswana, Kenya, Lesotho, Malawi, Mauritius, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania (Mainland), Tanzania (Zanzibar), Uganda, Zambia, and Zimbabwe.
The Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ) undertook three large-scale, cross-national studies of the quality of education: SACMEQ I (1995-1999, reading) with seven ministries; SACMEQ II (2000-2004, reading and mathematics) with 14 ministries; and SACMEQ III (2006-2010, reading, mathematics, and pupil and teacher knowledge about HIV and AIDS) with 15 ministries.
The SACMEQ III Project commenced in 2006 and was completed during 2011. The SACMEQ III data collection was implemented in fifteen SACMEQ Ministries of Education (Botswana, Kenya, Lesotho, Mauritius, Malawi, Mozambique, Namibia, Seychelles, South Africa, Swaziland, Tanzania (Mainland), Tanzania (Zanzibar), Uganda, Zambia, and Zimbabwe). The SACMEQ III Project followed the general research direction of the first two SACMEQ Projects by focusing on an examination of the conditions of schooling in relation to achievement levels of learners and their teachers in reading, and mathematics. The focus was expanded to cover the learners’ levels of basic knowledge about HIV and AIDS. The SACMEQ III Project involved data collections from around 61,000 learners, 8,000 teachers, and 2,800 school principals.
Kind of Data
Sample survey data [ssd]
Unit of Analysis
- v2.1: Edited, anonymous dataset for public distribution.
Data was collected on pupils’ home backgrounds and their school life; classrooms, teaching practices, teachers' working conditions, and teacher housing; enrolments, school buildings and facilities, and school management.
basic skills education [6.1]
The lowest level of geographic aggregation covered by the data is province, and in some cases, metropolitan area.
The desired target population for the SACMEQ III study was defined as "All pupils at Grade 6 level in 2007 (at the first week of the eighth month of the school year) who were attending registered mainstream primary schools". This definition used a grade-based description (and not an age-based description) of pupils because an age-based description would have required the collection of data across many grade levels due to the high incidences of "late starters" and grade repetition in SACMEQ school systems.The excluded population consists of those schools and pupils that have been excluded from the desired population to give the defined target population.
Producers and sponsors
Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ)
United Nations Educational, Scientific and Cultural Organization
Funding the project
Funding and Technical Support
Ministries of Education
Funding the project
The desired target population definition for the SACMEQ III Project was exactly the same (except for the year) as was employed for the SACMEQ I and II Projects. This consistency was maintained in order to make valid cross-national and cross-time estimates of "change" in the conditions of schooling and the quality of education.
The SACMEQ III data were selected using a stratified two-stage cluster sample design based on the technique of a lottery method of sampling proportional to size, with the assistance of SAMDEM software (Sylla et al., 2003). At the first stage, schools were selected in each region (province) in proportion to the number of pupils in that region in the defined target population. The main reason for choosing Region as the explicit stratification variable was that the SACMEQ Ministries of Education wanted to have education administration regions as "domains" for the study. That is, the Ministries wanted to have a reasonably accurate sample estimates of population characteristics for each region. At the second stage, a simple random sample of 25 pupils was taken within each selected school (in the Seychelles, all Grade 6 pupils in all 25 schools in the island country were tested).
In educational survey research the primary sampling units that are most often employed (schools) are rarely equal in size. This variation causes difficulties with respect to the control of the total sample size when schools are selected with equal probability at the first stage of a multi-stage sample design. One method of obtaining greater control over the total sample size is to stratify the schools accorging to size and then select samples of schools within each stratum. A more widely applied alternative is to employ probability proportional to size (PPS) sampling of schools within strata followed by the selection of a simple random sample size and results in epsem sampling of pupils within strata. The lottery method of PPS selection was implemented for the SACMEQ Projects with the assistance of the SAMDEM software (Sylla et al, 2003).
In order to avoid selection bias, precautions were taken to ensure that school heads and teachers did not have any influence over the sampling procedures within schools. This is because school heads and teachers might have felt they had a vested interest in selecting particular kinds of pupils, and this could have resulted in major distortions of sample estimates (Brickell, 1974). The planned South African sample was 400 schools and 10 000 learners. The achieved sample comprised of 392 schools and 9 071 learners.
Deviations from the Sample Design
Reasons for non-participation by the eight sampled schools ranged from schools that had since either ceased to exist or were merged into other schools, one school had since phased out the primary section, another school had a tragedy of learners who lost their lives in a road accident a day before the data collection and few other reasons that were considered valid. Because South Africa had actually oversampled schools, replacements were considered not necessary on the advice of the SACMEQ Coordinating Centre.Similarly, learners who were sampled in the sampled schools but were not available on the day of data collection were not replaced.
In the SACMEQ III project the South African overall response rates for schools and learners were 98% and 96,7%, respectively.
The calculation of sampling weights could only be conducted after all files had been cleaned and merged. Sampling weights were used to adjust for missing data and for variations in probabilities of selection that arose from the application of stratified multi-stage sample designs. There were also certain country-specific aspects of the sampling procedures, and these had to be reflected in the calculation of sampling weights. Two forms of sampling weights were prepared for the SACMEQ III Project. The first sampling weight (RF2) was the inverse of the probability of selecting a pupil into the sample. These "raising factors" were equal to the number of pupils in the defined target population that were "represented by a single pupil" in the sample. The second sampling weight (pweight2) was obtained by multiplying the raising factors by a constant so that the sum of the sampling weights was equal to the achieved sample size. A detailed account of weighting procedures can be found in Ross et al (2003).
Dates of Data Collection
Data Collection Mode
Data Collection Notes
According to the National Research Coordinators (NRC) manual, The training of Data Collectors could either be conducted centrally by the NRCs (in the case of small states) or had to be conducted in several sites in the country. After the training, these Data Collectors would be deployed as Data Collectors in the sampled schools.
Training of data collectors
In South Africa 174 data collectors were trained. On the first day of training the NRC presented a “simulated” data collection exercise in which he/she acted as a data collector and the trainees took the roles of learners, teachers, and School Heads. The second day involved an intensive study of the Manual for Data Collectors. The provided report sets down, in sequential order, all of the actions to be taken by the data collector from the time of receiving packages of data collection instruments from the Ministry of Education to the time when the data collector had completed the data collection and was preparing all materials for return. The third day involved a second “simulated” data collection whereby the trainees supervised a full-fledged data collection in several schools that were not involved in the main data collection. The experiences gathered during these exercises were shared and discussed during a later meeting so that all data collectors understood the procedures to be completed within schools.
“Main Data Collection” here and in the final report (provided as an external resource) refers to the actual field work. Two trained data collectors were assigned to each sampled school to administer the instruments. Special effort was made to ensure that the data collections were conducted according to explicit and fully-scripted steps so that the same verbal instructions were used (for learners, teachers, and School Heads) by the data collectors in all sample schools in all countries for each aspect of the data collection. This was a very important feature of the study because the validity of cross-national comparisons arising from the data analyses depended, in large part, on achieving carefully structured and standardized data collection environments.
The main SACMEQ III data collection occurred for most SACMEQ Ministries of Education in the period September to December 2007. In South African data was collected in September 2007 in 392 sample schools that were involved. Two days of data collection were required for each sample school. On the first day the data collectors had to sample learners from all the Grade 6 classes in the sampled schools, using a list of provided random numbers. The sampled learners were then given the learner questionnaire, the HAKT and the Reading test. On the second day they were given the Mathematics test. Part of the learner questionnaire required learners to get confirmation of the accuracy of the information from their parents and so the questionnaire was taken home and returned the following day.
In addition to completing a questionnaire, one teacher who taught the majority of the sampled learners for each of Reading, Mathematics and Life Orientation (for the HIV and Aids test) also completed the relevant tests. The data collectors were provided with a 40-point checklist in order to ensure that they completed all important tasks that were required before, during, and after their visits to schools. Each task was cross-referenced to specific pages of instructions in the data collectors’ manual. The data collectors also checked all completed questionnaires (learner, teacher, and School Head) and, if necessary, obtained any missing or incomplete information on the second day before they left the school. The materials were then handed over to the provincial coordinator for safekeeping, “hand editing” and dispatching to the National Research Coordinator (NRC) in Pretoria as soon as all data collection was completed.
Ministry of Education
- Apart from pupil achievement scores, SACMEQ studies are renowned for collecting a wide range of information about pupils, teachers, classrooms, school heads, schools, and school communities. For the SACMEQ III study, four main questionnaires (pupil, teacher, school head, and school information) were used.
- It is important to note that SACMEQ questionnaires were subjected to careful thought, thorough examination, and stringent refinement before they were administered. For example, for the SACMEQ III study, the questionnaires were developed by a committee of experts consisting of members drawn from all SACMEQ countries, SCC staff, IIEP staff, and private consultants, following:
(a) Field experiences gained from the SACMEQ II study,
(b) Recommendations arising from analyses of SACMEQ II data, and (c) policy questions raised by SACMEQ country ministries of education. These questionnaires were refined by the SACMEQ scientific committee, then piloted in each SACMEQ country and refined further before they were administered.
- One important innovation in the development of questionnaires for the SACMEQ III study was introduction of a “Homework form” for pupils to take home. This consisted of questions to which the pupil might not know the answers (for example, parental education, estimates of travel distance to school, home possessions, whether or not their biological parents were alive) that parents, family members, or guardians could help in filling in. This considerably reduced the number of missing values in the SACMEQ III study compared with previous SACMEQ studies.
- Materials were translated into Kiswahili (Tanzania Mainland and Zanzibar) and Portuguese (Mozambique).
Data Entry was done using WinDEM (Windows Data Entry Manager) Software. Preliminary data cleaning involved checks on data to ensure it was clean before it was sent to the SACMEQ Coordinating Centre (SCC) for further checks an analysis and calculation of sampling weights. (See p12 of the NRC Manual - provided as external resources - for more detail on the process.)
Data Checking and Data Entry
The South African NRT received the completed materials from the provincial coordinators and kept these safely while they were being checked, entered into computers, and then “cleaned” to remove errors prior to data analysis. Datachecking involved the “hand editing” of data collection instruments by a team of trained staff. The staff checked that:
(i) All expected questionnaires, tests, and forms had been received,
(ii) The identification numbers on all instruments were complete and accurate, and (iii) certain logical linkages between questions made sense (for example, they had to verify if the two questions to School Heads concerning “Do you have a school library?” and “How many books do you have in your school library?” were answered consistently).
Trained data capturers, supervised by the NRT, entered data into computers using the WINDEM software that was supplied by the SACMEQ Coordinating Centre. Data were “double entered” in order to monitor accuracy. Individual data capturers worked for maximum of six hours per day, and the whole data entry operation for South Africa was estimated to involve around 75 person days of data entry work.
During December 2007 the SACMEQ Coordinating Centre organized a training programme for all NRTs. The teams were led step-by-step through the required data cleaning procedures that they were to follow in their respective countries.
At individual country level, NRTs followed a “cyclical” process whereby data files were cleaned by the NRT and then emailed to the Coordinating Centre for checking and then emailed back to the NRC for further cleaning. The entire data cleaning process lasted seven months, starting in January 2008 and was complete by 31 July 2008. This was much shorter than the 18 months taken to clean the data for the SACMEQ II project.
To clean the data, using the WINDEM software, the NRTs followed specific directions to
(i) Identify major errors in the sequence of identification numbers,
(ii) Cross-check identification numbers across files (for example, to ensure that all learners were linked with their own Reading and Mathematics teachers),
(iii) Ensure that all schools listed on the original sampling frame also had valid data collection instruments and vice-versa,
(iv) Check for “wild codes” that occurred when some variables had values that fell outside pre-specified reasonable limits, and
(v) Validate that variables used as linkage devices in later file merges were available and accurate.
When data cleaning was complete, the NRT merged the data from all the sources. The merging process required the construction of a single data file in which learners were the units of analysis and the rest of the data from the other respondents and linked to the learner data. That is, each record of the final data file for the country consisted of the following four components:
(a) The questionnaire and test data for an individual learner,
(b) The questionnaire and test data for his/her Mathematics and Reading teacher,
(c) The questionnaire data for his/her School Head, and
(d) School and learner “tracking forms” that were required for data cleaning purposes.
To illustrate, with the merged file it was possible to examine questions of the following kind: “What are the average Reading and Mathematics test scores (based on information taken from the learner tests) for groups of learners who attend urban or rural schools (based on information taken from the School Head questionnaire), and who are taught by male or female teachers (based on information taken from the teacher questionnaire)?”
Estimates of Sampling Error
Design effect is a number (ratio) which indicates the amount of “sampling error” that is introduced by the use of a clustered (two-stage) sampling method in relation to the “sampling error” that would result if a simple random sample of the same size had been used. Alternatively, the “design effect” is the ratio of the variance (of the sample mean) for a multi-stage sample to the variance for a simple random sample of the same size. Applied to SACMEQ III, this means that for Reading the achieved two-stage sample of 9 062 had a variance (of the sample mean) which was 13,9 times the variance that would be realized if a simple random sample of the same size was used. For Mathematics this ratio was 13,7 while for HAKT it was 12,2. Generally, the inaccuracy associated with a multi-stage sample is many times greater than the inaccuracy associated with a simple random sample of the same size.
The quality of the data provided by the school heads, teachers, and pupils was examined in the following ways.
First, at the time of data collection, the data collectors who visited the schools verified, for example
(a) The actual existence and conditions of the school resources such as library, school head office, and staff room, and
(b) The official school records about the information provided by pupils such as their gender, age, days absent, and whether or not their parents were alive.
Second, similar questions were included in the school head, pupil, and teacher questionnaires, and these helped to verify the responses given by the respondents during data cleaning. For example, a question on the existence of a class library was included in both the teacher and pupil questionnaires. Any inconsistencies between the responses of the school heads, teachers, and pupils were followed up by the national research coordinators (NRCs) and corrected during data cleaning.
The processes of generating pupil scores, competency levels, measure of school location, socioeconomic status and tabulations are outlined in the SACMEQ-III Project Results Working Document Number 1 available as external resources.
Director- Southern Africa Consortium for Monitoring Educational Quality (SACMEQ)
International Institute for Educational Planning - UNESCO
Director- Southern Africa Consortium for Monitoring Educational Quality
International Institute for Educational Planning - UNESCO
International Institute for Educational Planning
United Nations Educational, Scientific and Cultural Organization (UNESCO)
Before being granted access to the dataset, all users have to formally agree:
1. To make no copies of any files or portions of files to which s/he is granted access except those authorized by the data depositor.
2. Not to use any technique in an attempt to learn the identity of any person, establishment, or sampling unit not identified on public use data files.
3. To hold in strictest confidence the identification of any establishment or individual that may be inadvertently revealed in any documents or discussion, or analysis. Such inadvertent identification revealed in her/his analysis will be immediately brought to the attention of the data depositor.
TERMS AND CONDITIONS FOR USE OF THE SACMEQ DATA ARCHIVE
The Southern and Eastern Africa Consortium for Monitoring Educational Quality (SACMEQ) Co-ordinating Centre (SCC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>) has produced a data archive containing all information collected for SACMEQ's first three educational policy research projects (SACMEQ I, SACMEQ II, and SACMEQ III). This archive is now available online on the SACMEQ website so as to give bona fide researchers and students online access to SACMEQ data and documents.
The SACMEQ data sets have been developed at great cost and with the application of stringent quality controls. It is being made available to eligible users because it has a great potential to contribute to educational policy development beyond what has already been achieved in this respect through the reports written by the National Research Co-ordinators (NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>s) and Deputy National Research Coordinators (NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>s). It is expected that many researchers and students will wish to use the Data Archive for research, publications, and/or training purposes.
The Terms and Conditions serve two purposes. Firstly, they provide interested applicants with guidelines on how to access this valuable information resource. Secondly, they are intended to safeguard against the danger of users being unaware of the complexities of the data collection process and consequently arriving at misinterpretations that could lead to incorrect conclusions.
2.0 How can the user gain such access?
In order to obtain SACMEQ Data Archive for any of the SACMEQ school systems, the applicant should follow these steps:
2.1 Read and Agree to these "Terms and Conditions for the Use of the SACMEQ Data Archive."
2.2 Complete an online application form.
3.0 What rules govern the use of the SACMEQ data archive?
3.1 The Data Archive is the outcome of expensive and time-consuming activities of the staff of the represented Ministries of Education spread over many years. For this reason, the SACMEQ Ministries of Education described in the Data Archive should:
3.1.1 be notified by the SACMEQ SCC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> of any request for data;
3.1.2 have an opportunity to review reports based on the data archive so as to correct any gross errors before they are published; and
3.1.3 satisfy themselves that the data have been used in such a manner that they contribute positively to the development of relevant education policies in relevant SACMEQ member countries.
3.2 It is the National Research Coordinators (NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>s) and Deputy National Research
Coordinators (DNRCs) who have spearheaded the collection and compilation of SACMEQ data. In acknowledgment of their efforts, the applicant(s) will be required to invite the relevant country's National Research Coordinator to participate in the study associated with the use of the data. Where an individual other than the NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> or DNRC is co-opted, the relevant NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> and DNRC shall be given the first right of refusal.
3.3 This provision does not apply in situations where the SACMEQ Data Archive is used purely for purposes of individual academic research by a student, and where the results are not intended for publication.
3.4 All relevant NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>s and DNRCs will be informed by the SCC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> about the recipients of the Data Archive.
3.5 SACMEQ provides the SACMEQ Data Archive to applicants on the basis of the intended use stated in the application. The applicant, therefore, should not use the data for any purpose other than the one stated in the application. Should the applicant(s) wish to use the data for a purpose other than that stated in the agreement, then he/she/they must first secure the written approval of SACMEQ before he/she/they proceed to do so.
3.6 SACMEQ data are provided for the sole and exclusive use of the applicant specified in the agreement. The successful applicant should, therefore, not share the SACMEQ Data Archive with, or pass it on to, any other unauthorized person(s).
3.7 The authorized user shall take responsibility for the safe custody of the SACMEQ Data Archive and also take reasonable steps to ensure that no unauthorized persons gain access to it.
3.8 The authorized user shall give due credit to SACMEQ for providing the Data Archive by providing written acknowledgement of this in any publication emanating from their use.
3.9 As the Data Archive remains the property of the SACMEQ, no other person(s), including the successful applicants or the member Ministry, shall re-distribute or offer for sale the SACMEQ Data Archive.
3.10 All reports based on the SACMEQ Data Archive have to secure the written approval of the SCC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> prior to the publication in order to confirm compliance to our terms and conditions, and also to ensure that there is no misunderstanding or misinterpretation of the data.
3.11 Once authorization has been granted to access the archive, you will see a link on the website which will take you to the Data Archive.
3.12 All relevant NRC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops>s will be informed by the SCC <http://www.sacmeq.org/_legal/accept_new?destination=training-workshops> about the recipients of the SACMEQ Data Archive.
3.13 Full acknowledgement of the source of the data (including reference to the SACMEQ Data Archive) must be given whenever the data are used.
3.14 A copy of any published article or report based on the SACMEQ Data Archive must be provided free of charge to (a) the SACMEQ Co-ordinating Centre, and (b) the Ministry(ies) of Education from whose data the report has been generated.
Citation requirement is the way that the dataset should be referenced when cited in any publication. Every dataset should have a citation requirement. This will guarantee that the data producer gets proper credit, and that analytical results can be linked to the proper version of the dataset. The Access Policy should explicitly mention the obligation to comply with the citation requirement. The citation should include at least the primary investigator, the name and abbreviation of the dataset, the reference year, and the version number. Include also a website where the data or information on the data is made available by the official data depositor.
Southern and Eastern Africa Consortium for Monitoring Educational Quality. SACMEQ Project III 2007 [dataset]. Harare: SACMEQ [producer], 2007. Paris: International Institute for Educational Planning, UNESCO [distributor], 2007. Ref. ZAF_2007_SACMEQ-III_v01_M. Dataset downloaded from [URL] on [date].
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
Copyright, Southern and Eastern Africa Consortium for Monitoring Educational Policy