Using census reports

PART 7 Using census reports

7.1 Comparability

Comparability between successive censuses is clearly desirable. Neverthe¿less, comparability presents dilemmas. In the first place, no census is perfect and to correct the mistakes of one census at the next produces an improvement but also produces a loss of comparability. Where the improvement made is great then the loss of comparability is a price worth paying: where the improvement is small then the balance between gain and loss becomes much finer.

Then there are of course fundamental causes of lack of comparability over the whole period of census taking due to changes in the form in which information has been obtained and from whom; changes in definition; changes in classification, particularly in the case of occupations; and the introduction of new topics.

Another major reason for lack of comparability is the continuous process of movement which can take one of two forms. Either there has been a change in the boundary of an area so that while the people themselves may not have moved or changed their characteristics the land area is smaller or greater than before; or there has been a growth of urbanised area. As a town grows it spreads beyond the boundary of the administrative area and while the figures from successive censuses in the instance still relate to the same area on the ground they also refer to a falling proportion of the population of the town; such figures will not be comparable and will become steadily divorced from reality.

Finally, changes in the response to questions and hence comparability between figures produced can be influenced by the wording or arrangement of questions which are otherwise identical. Changes can be made between censuses in the design of the census schedule which aim at making it easier to complete. To the extent that such an aim is fulfulled, census statistics will become less comparable. However, such changes can be difficult to predict. The census schedule is completed by millions of individuals whose reactions vary. In the design of questions, certain features must be picked out for emphasis. To the extent that the selection of emphasised points changes between censuses, the response to the question is likely to be altered.

A large number of users of census data wish to make comparisons between the results of successive censuses. For such comparisons to be completely valid it would be necessary for the questions, explanatory notes, processing conventions, definitions used and the classifications used in the tables to be comparable. For very simple topics, such as sex and age, these conditions are usually met, but for the more complicated topics great care should be exercised by the user of census statistics to ensure that figures that appear comparable are in fact so.

Bearing all the above considerations in mind there is often advantage, when compiling historical series of statistics, in looking through later Reports before extracting data from the earlier ones.

7.2 Sampling errors 1961 - 66

1961 Conventional sampling errors

Those census figures which have been derived from the ten per cent data are subject to sampling error which means that they will usually differ to some extent from the unknown true value that would have been obtained from a full count. This variability is inherent in sample based figures and should be distinguished from the element of error due to bias which is discussed in the next section. The great majority of figures published from the census fall into two groups, totals and proportions, though small numbers of figures of other types such as ratios of rates and proportions also appear.

Totals

For any sample total which is a small fraction (less than one-quarter) of the whole sample population, the statistical quantity known as the 'standard error' of this sample based figure may be approximately estimated by the square root of the sample total concerned. To allow for the fact that sampling was on a ten per cent basis and was without replacement, this square root should be multiplied by the factor √0.9. Given this estimate of the standard error the probability is approximately

0.68 that the true value is within one standard error of the estimated value

0.95 that the true value is within two standard errors of the estimated value.

This method of estimating sampling errors assumes that the sample in the 1961 Census was equivalent to a random sample of persons. This further implies an assumption that the tendency for the sampling error to be increased because of the clustering which follows from the use of a sample of households rather than one of persons was offset by the stratification involved in the use of a systematic sample which ensured that 1 in 10 households was selected evenly throughout the country.

Proportions

Although the great majority of figures published from the 1961 Census are numbers whose sampling error can therefore be conventionally estimated as described above, a number of tables contain proportions for rates where the sampling error cannot be estimated simply from the published figure. It was therefore decided to adopt certain conventions in the published tables which would warn the users of these tables when the sampling error to be attached to a proportion or rate reached particular levels. The levels adopted were as follows:

Figures in italic type or accompanied by the symbol *

Standard error between 10 per cent and 25 per cent of the proportion or rate. This means that there is a chance of about 1 in 20 that the true proportion could differ from the published figure by some¿thing between one-fifth and one-half of that figure.

Figures in brackets in italic type or accompanied by the symbol ‡

This implied that the standard error was 25 per cent or more of the proportion or rate ie that there was a chance of about 1 in 20 that the true proportion could differ from the published figure by as much as half or more of that figure. This symbol also accompanied all zero entries in tables of proportions. Here it cannot be interpreted in the numerical terms defined above but is an indication that because of sampling error the true proportion may be a non-zero quantity.

The formula used to estimate the standard error of a rate or proportion was as follows. Each proportion can be written as a ratio (x/n) . The denominator (n) may be, for example, the total number of people who stated their duration of residence while the numerator (x) may be, for example, those who stated their duration of residence as less than 15 years. In practice such ratios have often been multiplied by some constant K , for example, 100 or 1,000.

Thus writing the ratio as

p = x/n

the printed proportion is K_p .

If q - 1 - p

and S(K_p ) = estimate of the sampling error of K_p

then the formula used was

S(K_p ) = K √pq/n .

and C = √q/np

gives the sampling error as a fraction of the printed proportion. Thus, the figure was printed in italic type or accompanied by the symbol * where C was greater than or equal to 0.10 and less than 0.25 while the figure was printed in italic type in brackets or accompanied by the symbol ‡ whenever C was greater than or equal to 0.25.

If two independent proportions Kp₁ and Kp₂ (K having the same value) are to be compared, then the sampling error of their difference may be taken as

S(Kp₁ - Kp₂ = K √ ((p₁ q₁ /n₁ ) + (p₂ q₂ /n₂ ))

As mentioned in the paragraphs dealing with the sampling errors of totals these formulae are based on the assumption that the ten per cent sample used in 1961 Census could be taken as equivalent to a simple random sample of persons. The general point should also be remembered that the insertion of warning symbols in the tables takes no account of the fact that the estimate of standard error used to decide whether a significant indication should be used, is itself subject to sampling error.

As part of the statistical assessment programme of the 1961 census, it was decided to check the validity of the assumption that the sample used in the census could legitimately be taken as being equivalent to a simple random sample. It was decided, therefore, to calculate, using a sub sample the true sampling errors, taking into account the aspects of clustering and stratification mentioned above. The true sampling error was compared with the estimate of the sampling error for a number of characteristics. Full details of these exercises are given in the 1961 General Report .

1966

The sample selected for this census was stratified by local authority area. The sample unit was either a cluster (ie a dwelling) containing at least one person, or one person (one of every ten) in a 'large' non-private establishment.

No sampling errors consistent with this sample design are available and therefore only a rough indication of sampling error can be calculated based on the assumption that the sample was a simple random sample of people, households or dwellings by using the same formula as for 1961.

7.3 Bias (1961 only)

Cause of bias

It has not been possible to obtain any objective evidence as to the basic cause or causes of the bias which has been found. It was confirmed that the bias arose at the enumeration stage itself and had not been introduced during the processing of census data. This was possible by examining the sample for a few areas as originally selected and in the form in which it was finally processed. The comparison revealed no significant difference between the two. It seems clear, therefore, that this must have been an enumeration problem though one further point should be mentioned. The ten per cent sample census schedule contained a section for absent members of the household whereas the 90 per cent schedule did not. It is therefore possible that there was a tendency for households with a member away on census night to record themselves as N person households on the E.10 form in the sample with one person added in the absent member section but (wrongly) as a household with (N + 1) persons present on an E.90 form. This phenomenon could help to account for the relative shortage of one-person households in the sample but does not contribute towards the similar shortage of households with large numbers of persons. There is no evidence that this phenomenon occurred at all widely but even if it had it could not have provided anything like the full explanation.

It has already been noted that there is considerable evidence that some enumerators departed from the strict sampling scheme in a number of ways. It is easy to imagine ways in which such departures could be statistically biased, particularly if the enumerator feared either resistance to the larger and more complicated sample schedule or thought that certain types of person, such as the elderly or immigrants, might have difficulty in completing the more complex form. Some enumerators may have departed from the correct sample in an attempt to make the sample 'representative of their enumeration district. This would lead them to omit from the sample an unusual household which should have been included. Such features would contribute to the shortfall of certain groups born outside England and Wales (noted above) and also the shortage of old people. The great extent of the bias in rural districts could follow from the greater variation between households within an enumeration district in some rural areas compared with the greater uniformity found in many (though by no means all) urban enumeration districts. In some rural enumeration districts the enumerator would have a good idea of the type of person in a building before he reached it. This would be less true in many urban areas where the type of housing might be uniform throughout a complete enumeration district. The greater variability in rural areas would give more incentive and opportunity to 'switch' a sample schedule than would be present in many urban areas.

Action taken on bias

The discovery of the bias in the ten per cent sample raised the difficult problem of deciding how, or if at all, the ten per cent sample tabulations should be amended or adjusted to attempt to correct the bias. The decision was taken not to alter the actual numbers obtained from the sample in the published tables. Even if the full information necessary to make such adjustments had been available it would have been a vast undertaking which, even with a large computer, would have produced an unacceptable delay in the production of the statistics. In fact, the information available on the true nature and size of the bias was very restricted and was quite insufficient to undertake a full correction programme. Instead of modifying the actual numbers produced it was decided to produce certain correcting factors which users could apply to the tables derived from the ten per cent sample. It was not a practical proposition to calculate such factors for every entry in the tables or even for all tables. Instead correction factors were obtained for certain of the more important marginal totals. To take one example: a bias factor was worked for each of the occupation orders and each of the industry orders. The intention was that these bias factors should be used by multiplying the sample figure by the appropriate bias factor to give a new figure partially corrected for bias. Thus, a bias factor of 0.98000 denoted that the published census estimate was too high by two per cent.

It is very important that the bias factors computed should be correctly interpreted. They can remove only that element of bias associated with the classification of households by numbers of persons, by numbers of rooms, by sharing status, by area and any effect due to the country of birth of the person concerned, though this last factor was only taken account of in a very summary fashion. They cannot remove other elements of bias which may exist and which may be fundamentally associated with other characteristics, such as occupation, socio-economic group, etc. It should also be remembered, as pointed out earlier, that these factors have been calculated in relation to the population enumerated in private households. No specific account was taken therefore in working these factors of biases found in that part of the population which was enumerated outside private households.

Calculation of bias factors

A brief description of the method of calculating the bias factors is as follows:

For each of three birthplace groups (those born in the British Isles, those born in the West Indies and Caribbean territories, Pakistan, Ceylon, or Cyprus, and those born elsewhere) the one hundred per cent count of private households and the ten per cent sample count of private households were each (separately) distributed over a 4-way table whose axes were,

six categories of the number of persons in the household,
(1, 2, 3, 4, 5, 6 or more),
six categories of the number of rooms occupied,
(1, 2, 3-4, 5, 6-7, 8 or more),
three categories of sharing status, namely,
- non-sharing,
- sharing with exclusive use of stove and sink,
- sharing without exclusive use of stove and sink,
various geographical areas, namely,
1. England and Wales.
2. All Standard Regions separately.
3. All Conurbations separately.
4. Remainders of Standard Regions after subtracting Conurbations.
5. All Conurbations combined.
6. Urban areas outside Conurbations with populations of 100,000 or more, combined.
7. Urban areas outside Conurbations with populations of 50,000 and less than 100,000, combined.
8. Urban areas outside Conurbations with populations of less than 50,000, combined.
9. Rural Districts outside Conurbations, combined.

Then if we let X_ijkl = 100% count of private households in the

i th persons category (i = one of the six persons groups from 1 to 6 or more)

j th rooms category (j = one of the six rooms groups from 1 to 8 or more)

k th sharing category (k = one of the three sharing categories)

l th area

let x._ijkl = corresponding 10% count of private households.

A 'raising factor' for each cell of this table was calculated as

R_ijkl = (X_ijkl / x_ijkl )

As an example of the calculation of one bias factor, consider the males in Occupation Order I. This group of males can be distributed over the four-way table of private households from the ten per cent sample, according to the households in which they were enumerated. Let y_ijkl be the number of males in this group who were enumerated in the x_ijkl . households in any one cell of this table.

Then the bias factor for males in Occupation Order I who live in the l th area is defined as

[Very complex formula which needs to be included as a graphic]

Similar bias factors were calculated in exactly the same way for females in this particular occupation order and for males and females in the remaining occupation orders and for the following ten per cent characteristics:-

Industry Orders
Socio-economic Groups
Length of stay (groups as for Migration Table 1 )
Type of move (groups as for Migration Table 16 )
Terminal Education Age (as for Education Table 2 )
Place of work (within/outside area of usual residence).

7.4 Presentation conventions 1951 - 66

1951

In all tables derived from the one per cent sample the sample figures have not been multiplied by 100. Instead the comma for entries with two or more digits has been inserted one digit from the right. For example a sample figure of 15 is presented as 1,5 to remind users that the population estimate is 1,500.

Care should be taken to read sample figures of one digit correctly.

6 in the sample means an estimate of 600 in the population.

In the Fertility tables special conventions and significance indicators apply and when using these tables reference should be made to page x of the report.

1961 - 66

To emphasise that the figures shown in 1961 Sample reports and all 1966 reports are derived from a ten per cent sample they have not been multiplied by ten. Instead the comma for entries with three or more digits has been inserted two digits from the right. For example a sample figure of 125 is presented as 1,25 to remind users that the population estimate is 1,250.

Care should be taken to read sample figures of one and two digits correctly. For example,

13 in the sample means an estimate of 130 in the population and 6 in the sample means an estimate of 60 in the population.

A blank means that a particular cell in the table is impossible.

The symbol '-' means zero in the sample.

Where rates are calculated the figure 0 means that the actual rate is less than ½ but is not, in fact, zero.

In the 1961 Fertility tables special significance indicators apply and when using these tables reference should be made to page xvi of the report.

Office of Population Censuses and Surveys/General Register Office, Guide to Census Reports: Great Britain 1801-1966 (London: HMSO, 1977) Crown Copyright. The Office of National Statistics has granted the Great Britain Historical GIS Project permission to computerise this publication and include it in this web site. All other rights reserved.

Previous Selection