The 2001 BAIS I was Botswana's first national population based household sexual behavioral survey. The survey was a base line conducted to obtain more information on topics related to HIV/AIDS. During this base line survey, no HIV testing was undertaken. In 2004 the second sexual behavioural national population level survey, the 2004 BAIS II was implemented. It was designed to identify and measure factors (behavior, knowledge, attitudes and cultural influences) that are associated with HIV epidemic in Botswana. The 2004 BAIS II focused on issues related to the prevention and impact of HIV/AIDS amongst the population aged 10-64, and also estimated HIV prevalence amongst the population aged 18 months and above. The 2004 BAIS II documented significant improvement in many of the standard HIV/AIDS indicators, including an increase in knowledge about the ways HIV is transmitted, greater use of condoms, and increased use of Voluntary Counseling and Testing (VCT). The survey also showed HIV infection prevalence rates of 17.6 percent among those aged 18 months and over and 25.0 percent among those aged 15-49 years.
The primary objective of the 2008 BAIS III was to update current information on the behavioral patterns of the populations aged 10-64 years and the HIV prevalence and incidence rates among those aged 18 months and above at national, district and sub-district level. This information will be used for continuous strategic prevention and national HIV program planning and future HIV and AIDS research.
Specifically, the survey was intended to provide:
i. current national HIV prevalence and incidence estimates among the population aged 18 months and above;
ii. indicative trends in sexual and preventive behavior among the population aged 10-64 years;
iii. a comparison between HIV rate, behavior, knowledge, attitude, and cultural factors that are associated with the epidemic with estimates derived from previous surveys; and
iv. information on demographic, socio-economic, housing and household members' conditions associated with and/or are determinants and consequences of the pandemic.
A related objective is to produce survey results in a timely manner and ensure that the data are disseminated to a wide audience of potential users in government and non-governmental organizations within and outside Botswana as part of facilitation of broader effort to strengthen strategies aimed at combating the disease.
Kind of Data
Sample survey data [ssd]
Version 02 of the AIDS Impact Survey III 2008 was updated in the following metadata fields: Description of scope, citation requirements, disclaimer and datasets were provided.
The AIDS Impact Survey III 2008 covered the following topics:
- Parental survival and fostering
- Type of economic activity, occupation and industry
- Household characteristics such as type of housing unit, material used for construction of housing unit, water supply source of energy etc. health, care and support for household members
- Eligibility criteria for being an individual questionnaire respondent
- Background characteristics (age, education, occupation, religion, etc.)
- Alcohol consumption and drug use
- Sexual history and behavior
- Male circumcision and sexually transmitted diseases
- Knowledge about aids and level of exposure to interventions
- Attitudes towards people living with HIV/AIDS, gender issues and counseling
- Childbearing and antenatal care
- Availability of social and medical services
Producers and sponsors
Central Statistics Office (CSO)
Ministry of Finance and Development Planning
National AIDS Coordinating Agency (NACA)
The 2008 Botswana AIDS Impact Survey III (BAIS III) was designed to provide a comparison (trend) between HIV prevalence rate, behavior, knowledge, attitude, and other factors that are associated with the epidemic with estimates derived from previous survey, i.e. 2004 BAIS II.
For BAIS-II the sampling frame based on the 2001 Population and Housing Census. This comprised the list of all Enumeration Areas (EAs) together with number of households. In 2001 Census the EAs were framed of manageable size (in terms of dwellings/households). So, the primary sampling units (PSUs) were EAs.
Stratification was undertaken such that all districts and major urban centers become their own strata. With regard to increase precision consideration was also given to group EAs according to ecological zones in rural districts and according to income categories in cities/towns. Geographical stratification along ecological zones and income categories was expected to improve the accuracy of survey data in view that homogeneity of the variables within stratum was relatively high.
A stratified two-stage probability sample design was used for the selection of the sample.
The first stage was the selection of EAs as Primary Sampling Units (PSUs) selected with probability proportional to measures of size (PPS), where measures of size (MOS) were the number of households in the EA as defined by the 2001 Population and Housing Census. In all 459 EAs were selected with probability proportional to size.
At the second stage of sampling, the households were systematically selected from fresh list of occupied households prepared at the beginning of the survey's fieldwork (i.e. listing of households for the selected EAs). Overall 8,275 households were drawn systematically.
Note: See detailed sampling information in BAIS-III final report.
Once the data set was cleaned, sampling weights were applied to the data. Being a multistage design, it follows naturally that the sample selected at each stage represents (or is assumed to) the respective population. The fundamental assumption is that units selected at each stage were similar to those not selected, in respect of characteristics of interest. In the treatment of unit for the non-response the assumption that the responders were similar to non-respondents though should not be always taken for granted.
Sampling weights are equal to the inverse of the probability of selection. Therefore the sampling probabilities at first stage of selection of EAs including probabilities of selecting the households were used to calculate the design weights. Non response adjustments were also taken into considerations while calculating the final sampling weight.
Dates of Data Collection
Data Collection Mode
Data Collection Notes
Among the trainees 50 supervisors were trained over a period of two weeks from the 19th– 30th May 2008. The other group of trainees was two hundred and ninety (290) enumerators trained over a period of four weeks (2nd -30th June 2008) with supervisors. Of these 290 enumerators 250 were recruited, among which some were designated for coding and editing duties.
There was a total of 300 field staff comprising 50 supervisors and 250 enumerators. Of the 50 supervisors, 18 supervisors were permanent Central Statistics Office staff whereas 32 were on temporary contract. A total of 50 teams were engaged in the field work exercise and each team comprised 2 drivers, 4 enumerators, and a supervisor. Depending on the workload and type of terrain some of the teams had five enumerators. At least 9 EAs were assigned to each team during the course of the survey. Six officers who were responsible for the quality control field visited and supported the teams and the data collection period was from the 15th July to the 22nd October, 2008.
The questionnaires are the primary recording documents of the survey. In the development of the questionnaires, along with the professionals, the other members (Including some users) were also invited. The final version of the questionnaires was finalized on the basis of the experiences aimed from the pretest conducted using the drafted questionnaires for the survey.
The 208 BAIS III has three major components, namely the Household Questionnaire, the Individual Questionnaire and the Blood Collection Form.
The Household Questionnaire was used to list all members of the selected households and their demographic characteristics such as age, sex, orphan hood (0-17 years) and economic activity.
The Individual Questionnaire was designed to capture information regarding demographic characteristics, care and support, marriage and cohabiting partnerships, alcohol consumption and drug use, sexual history and behavior, male circumcision and sexually transmitted diseases, knowledge about HIV/AIDS and level of interventions, attitudes towards people with HIV/AIDS, childbearing and antenatal care as well as availability of social and medical services in response to the pandemic.
The third component is on the scale of the pandemic. It was designed to collect blood samples from members of households aged 18 months and over for testing and estimation of HIV prevalence and derivation of incidence measures.
Before data entry was carried out, the questionnaires were edited to check if all the relevant questions have been responded to and coded according to the codes designed for the study. Editing and coding started in July 2007 and finished in February 2008. Data entry was carried out under the supervision of one programmer/supervisor. Consistency checks on the data set as per the computer edit specifications designed by the subject matter specialists were performed.
Estimates of Sampling Error
The estimates from a sample survey are affected by two types of errors: (1) non-sampling error, and (2) sampling errors. Non-sampling errors are the results of mistakes made in implementing data collection and data processing, such as failure to locate and interview the correct household, misunderstanding of the questions on the part of either the interviewer or the respondent, and data entry errors. Although numerous efforts were made during the implementation of the 2008 BAIS III to minimize these type of errors, non-sampling errors are impossible to avoid and difficult to evaluate statistically.
Sampling errors, on the other hand, can be evaluated statistically. The sample of respondents selected in the 2008 BAIS III is only one of many samples that could have been selected from the same population, using the same sample design and expected size. Each of these samples would yield results that differ somewhat from the results of the actual sample selected. Sampling errors are a measure of the variability between all possible samples. Although the degree of variability is not known exactly, it can be estimated from the survey results.
A sampling error is usually measured in terms of standard error for a particular statistic (mean, percentage, etc.), which is the square root of the variance. The standard error can be used to calculate confidence intervals within which the true value for the population can reasonably be assumed to fall. For example, for any given statistic calculated from a sample survey, the value of that statistic will fall within a range of plus or minus two times the standard error of that statistic in 95 percent of all possible samples of identical size and design.
The standard error can also be used to compute the design effect (DEFT) for each estimate, which is defined as the ratio between the standard error using the given sample design and the standard error that would result if a simple random sample had been used. A DEFT value of 1 indicates that the sample design is as efficient as simple random sample: a value greater than 1 indicates that increase in the sampling error is due to the use of more complex and less statistically efficient design.
If the sample of respondents had been selected as a simple random sample, it would have been possible to use straightforward formulae for calculating standard errors. However, the BFHS sample is the results of a stratified two stage design which is considered a complex design, hence special methods and software's are required to take into account the complexity of the design.
WesVar 4.3 statistical software (supported by WESTAT) was used to obtain standard errors, confidence intervals and design effect for selected indicators. It is a powerful tool for statistical data analysis from complex survey designs which includes multi-stage, stratification and unequal probability samples. Jackknife replication method was applied which forms part of the replication options within this software. To estimate variances using the jackknife method requires forming replications from the full sample by randomly eliminating one sample cluster (enumeration area) from a domain or stratum at a time. Then a pseudo-estimate is formed from the retained EAs, which are re-weighted to compensate for the eliminated unit. Thus, for a particular stratum containing k clusters, k replicated estimates are formed by eliminating one of these, at a time, and increasing the weight of the remaining (k - 1) clusters by a factor of k /(k - 1). This process is repeated for each cluster.
Note: See detailed sampling error calculation which is presented in 2008 BAIS-III final report.
Use of the dataset must be acknowledged using a citation which would include:
- the Identification of the Primary Investigator
- the title of the survey (including country, acronym and year of implementation)
- the survey reference number
- the source and date of download
Central Statistics Office. AIDS Impact Survey III (AIS) 2008. Ref. BWA_2008_AIS-III_v01_M. Dataset downloaded from [URL] on [date].
Disclaimer and copyrights
The user of the data acknowledges that the original collector of the data, the authorized distributor of the data, and the relevant funding agency bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
DDI Document ID
Development Economics Data Group
The World Bank
Documentation of the DDI
Date of Metadata Production
DDI Document version
Version 02 (December 2017)
Updates were made in the following metadata fields:
- Description of scope
- Citation requirements
- Datasets provided