Data Editing
Recruitment and Training of Editors and Coders
About 15 clerical officers who were previously engaged in the various units of the Office and 10 newly recruited statistical officers were called on to the editing and coding of the census forms while a request for the services of 50 additional clerical officers was made to the Ministry for Civil Service Affairs and Administrative Reform. Between March 2000 and May 2001, small groups of clerical officers from the ministry joined the team. Staff turnover was high; many left for better jobs so that finally the number of editors and coders in the team attained a maximum of 50 around May 2001.
Editors and coders were trained by the statistician in charge of the exercise, in small groups and as and when they joined the team. Training was essentially an on-going process and supervisory staff had to ensure that instructions were understood and followed. To achieve uniformity and consistency, problems that arose and which were not covered in the manuals or during training sessions were discussed with the senior statistician. Instructions were then transmitted to the team of editors and coders during short briefing sessions that were conducted as and when needed.
The main duties of the officers consisted in the editing and coding of the census forms. However, they were also involved in various administrative works such as the preparation of appointment letters to field staff, preparation of addressed population census forms, preparation of census materials for field staff, preparation of lists of field staff for payment, reception of completed census forms and other documents from the field, etc. They also worked on the correction of invalid records at the validation stage of the Housing and Population data files.
Editing and Coding of Housing Census Questionnaires
Editing and coding of the Housing Census questionnaires started during the second week of March 2000 and was completed during the first week of May of the same year. Around 15,000 booklets of 25 housing census forms were handled in the process. However, since not all the 25 forms making a booklet were used, it is estimated that about 310,000 Housing Census forms were edited and coded. A team of 30 editors and coders and three supervisors was involved in the exercise. On the average, an officer edited and coded about 300 Housing Census forms per day.
Editors and coders first verified that geographical identifiers on the cover of booklets making the EA batch were the same. Consistency checks of block numbers, building numbers within blocks as well as the housing units within buildings were then performed. Editors finally proceeded to consistency checking, and editing and coding of the information collected according to instructions given.
The Housing Census form being largely pre-coded, only the locality codes had to be inserted. The coding of Section VI - Commercial, industrial establishments, hotels and boarding houses and Section VII - Fruit-trees on premises was treated as a separate exercise and was carried out around June and July 2000 when the editing and coding of Sections I to V was completed. The reason was that the capture and processing of information needed for the production of address labels had to be completed early enough so that addressed census forms could be prepared on time for the Population Census enumeration.
Editing and Coding of Population Census Questionnaires
Editing and coding of Population Census forms started in August 2000 and ended in June 2001. About 300,000 Population Census forms containing around 1,200,000 entries were handled. The team of coders which was composed of about 30 officers in August 2000 grew to a maximum of about 50 around May 2001. It should be noted that a request for 70 officers was made for this exercise; based on the number of officers requested, the editing and coding of the Population Census forms was scheduled to be completed around April 2001. The team therefore had recourse to extensive after-office work so that the exercise could be completed within a reasonable time frame. On the average, about 35 forms were edited and coded per day by an officer.
The editing and coding of the Population Census forms consisted of three different stages: the overall verification of the EA batch, consistency checking and editing of the information collected followed by coding.
Verification of the EA batches consisted of checking that all census forms in a given batch had the same EA codes; appropriate geographical codes were inserted on unaddressed forms that were used to enumerate newly formed households. Officers then proceeded with consistency checking and editing of the individual forms according to instructions given. Some of the checks performed were the verification of the presence of only one “head” per household and the sequential numbering of entries on the forms, consistency between age and date of birth, consistency between age and marital status, and consistency between age and educational attainment.
The coding of the Population Census forms was more complex and time-consuming than that of the Housing Census forms where only locality codes were inserted. Apart from the geographical codes that were printed on the address labels, all information on the population census forms had to be coded.
Various code lists were used. Because of difficulties encountered by editors and coders in understanding the different codes, a system was devised such that a form was handled by two officers. The first officer performed the overall verification of the EA batch and edited and coded the part of the form prior to economic activity. The second officer edited and coded the part on economic activity. The group working on the second part of the form was chosen according to its ability to understand the different codes involved. The implementation of the system had, as result, a reduction in the number of coding errors and an increase in the number of forms handled daily by the team.
Software and Equipment
The software used for the processing of the census data was IMPS 4.1 (Integrated Microcomputer Processing System) of the International Programs Center of the US Bureau of the Census, which is specifically designed for census and survey data processing.
The software, which operates in a Windows environment, has separate modules for data entry (CENTRY), data edit and imputation (CONCOR), publication tabulation (CENTS), quick tabulation (QUICKTAB), table retrieval (TRS), variance calculation (CENVAR) and data entry control (CENTRACK). The modules that were used in the processing of the census data were CENTRY, CONCOR, QUICKTAB and CENTS.
The following equipment was available for data capture and processing:
(i) 30 PCs for data capture,
(ii) 3 PCs with zip disk drives for validation and tabulation,
(iii) 2 line printers for printing of address labels and
(iv) 2 laser printers for printing of publication tables.
Processing Operations
The processing of the Housing Census data and the Population Census data were done along the same lines, although the various operations involved in the processing were somewhat more complex for the Population Census than for the Housing Census. On the other hand, the production of address labels concerned only the Housing Census. The processing of the census data involved the following main operations:
(i) Writing programmes for
(a) data entry applications with range checks,
(b) validation of data files,
(c) checking consistency of EA data file names with Island, Geographical District, Municipal Ward/Village Council Area and EA codes in the data files,
(d) creation of address label files,
(e) tabulation of census data;
(ii) Data capture;
(iii) Data validation and updating of data files;
(iv) Checking for inconsistent EA data file names;
(v) Checking for duplicate and missing EAs;
(vi) Production of address labels from Housing Census files and printing of labels;
(vii) Consolidation of EA data files;
(viii) Tabulation.
The aim of checking the consistency of EA data file names with the EA geographical codes that appeared in the data files was to ensure that the correct geographical codes of EA batches had been entered, and also to safeguard against creation of EA data files with similar names. Any error at this stage would have caused serious problems in the control of captured EAs: missing EA files and EA data files with similar names would then have had to be handled.
Data Capture
Data entry with 100% verification (rekeying) of the Housing and Population Census forms was done by operators of the Central Information Systems Division of the Ministry of Information Technology and Telecommunications. The data entry staff was composed of a maximum of 28 operators and 5 supervisors working normally on a one-shift and five working-day system. Other applications were run concurrently with that of the census so that, whenever the need arose, data entry officers shifted to applications with higher priority. The Division had recourse to extensive after-office work to complete the data entry exercise within reasonable time limits.
The data entry exercise for the Housing Census started in March 2000 and was completed in May of the same year when data for about 310,000 forms were keyed in. The number of keystrokes involved in the data entry and verification exercise was estimated to be around 70 million while the average number of keystrokes per operator per hour was estimated to be around 7,500 with a range extending from 5,000 to 9,000.
Data of Section VI - Commercial, industrial establishments, hotels and boarding houses and Section VII - Fruit-trees on premises of the Housing Census questionnaires were captured during the months of July and August 2000.
Data capture for the Population Census started in September 2000 and was completed in July 2001. Data for around 300,000 Population Census forms containing about 1,200,000 records were keyed in during that period. The total number of keystrokes was estimated to be around 185 million and the average speed of an operator was around 12,000 keystrokes per hour, the range varying from 8,000 to 16,000.
Production of Address Labels
As mentioned earlier, names and addresses of heads of households obtained at the Housing Census were used as frame for the Population Census. Names and addresses as well as the geographical identification codes needed to identify households were extracted from the data files and printed on labels which were then stuck on Population Census forms.
A programme was run regularly to assess the completeness of the work before proceeding to the extraction of the required information. The programme in fact flagged all validated EA files on a master list of EAs by Supervisor. As and when all data files for a given Supervisor had been validated, these were consolidated. Information required was retrieved and address label files created. Printing of address labels was done in order of EA by Supervisor.
A total of about 310,000 address labels were printed during the month of May 2000; two line printers with a speed of 200 lines per minute were used for the task.
Data Validation
A validation programme that verified field consistencies was run to identify records with errors. Listings of these records were produced; relevant census forms were retrieved and corrections made accordingly and data files updated. The validation of Housing Census files was done, in parallel with the editing and coding exercise, by a team of five officers during the months of April and May 2000. Because of shortage of staff, the validation of the Population Census data files was carried out on completion of the editing and coding exercise, i.e. during the months of July and August 2001.
Once validation of data files was completed, the data files were concatenated to the country level. The size of the concatenated data file for the Housing Census was about 56 MB while that for the Population Census was about 134 MB.
A preliminary set of publication tables was produced using the country data file. Analysis of these tables showed that no additional editing was needed for the Housing Census data. As regards the Population Census data, while the tabulated counts of households and population were found to be consistent with the Housing Census tabulated figures, there were inconsistencies in some tables. A list of relevant edit specifications was drawn and incorporated in a CONCOR programme to remove the inconsistencies from the tabulated data. It should be mentioned that the edits included in the CONCOR correction programmes were not exhaustive so that tables still contained slight inaccuracies that would be too costly and time-consuming to identify and correct.