An investigation of the consistency of Statistics South Africa’s employment data between surveys

Type Thesis or Dissertation
Title An investigation of the consistency of Statistics South Africa’s employment data between surveys
Author(s)
Publication (Day/Month/Year) 2011
URL http://mobile.wiredspace.wits.ac.za/bitstream/handle/10539/11204/Joseph Lukhwareni- Masters Research​report.pdf?sequence=1
Abstract
The purpose of the study is to investigate possible reasons as to why different surveys conducted
by Statistics South Africa (Stats SA) give different estimates of the percentages in the different
employment categories. In order to investigate the different sources of variability, that is, surveys
done in different years, surveys using different questionnaires, different sample designs and
different employment profiles, the following comparisons were done for Gauteng and the Eastern
Cape:
• To compare estimates of employment status over time for the March Labour Force
Survey (LFS) 2006 and 2007; September LFS 2006 and 2007; and General Household
Survey (GHS) September 2006 and July 2007.
• To compare estimates of employment status across surveys for LFS September 2006;
GHS September 2006; and LFS September 2007, July GHS 2007 and Community
Survey (CS) October 2007.
In order to generate a set of comparable estimates across surveys and within surveys over time,
this study identifies and addresses the various sources of potential non-comparability. The
methodologies utilised are Chi-squared Automatic Detection (CHAID) and multinomial logistic
regression. These statistical techniques were used to identify variables which are associated with
employment status.
The predictor variables included in the analysis are age group, highest level of education, marital
status, population group, sex and source data. The results from CHAID for all data sets show that
age group is the most significant predictor on which data on employment status can be
segmented. At the root node (the first level of the CHAID tree), data was partitioned by the
categories of age group. Highest level of education, sex, population group and province were
significant within the categories of age group. Either province or population group was significant
within the age group 20–29 years old depending on the data that is being analysed. Sex was
most significant within the age group 50–65 years old.
The results of multinomial regression show several significant interactions involving from five to
seven factors for different data sets. The logistic regression results were not as good as those of
the CHAID analyses, but both techniques give us an indication of the relationships between the
predictor variables and employment.
The analysis of the CS, LFS and GHS in 2007, when explaining employment status, split on age
group. Highest level of education was the most significant predictor when comparing the three
data sets. There are differences among the three data sets when explaining employment status.
This is due to the use of different mid-year population estimates, differences in the instructions
given in the questionnaire for CS 2007 and other surveys, as well as the sample size of the
surveys. There are indeed significant differences between Gauteng and Eastern Cape in relation
to employment status.

Related studies

»