Tuesday, October 19, 2010

Census saga continues

The Centre Block on Parliament Hill, containin...

Image via Wikipedia

The link below offers Statistics Canada’s latest position on the census controversy, which contradicts numerous wrongful assertions made by the Industry Minister, Mr. Tony Clement.

Statistics Canada has formerly stated what the statisticians have been alerting to all along, i.e., the Statistics Canada is not aware of the extent of the adverse impacts on the quality of the data to be collected under the voluntary National Household Survey. Furthermore, Statistics Canada is not vouching for the quality of data under the voluntary survey and it is explicitly stating that the level of quality for the voluntary survey will be inferior to the one collected under the mandatory long-form census. In addition, Statistics Canada warns that the change will affect the comparability of data over time thus allowing only limited, if at all, comparisons between data collected in 2011 with the census data collected previously. The agency further expresses, albeit in muted terms, its displeasure over the modus operandi for implementing the revised methodology, which was “introduced relatively rapidly with limited testing.”

While Minister Clement on many occasions had suggested that the voluntary survey will return a richer database because it is being sent to one-in-three households instead of the 20% sample before, Statistics Canada estimates that despite larger sampling, the voluntary survey is likely to be completed by only 16% of Canadian households (because of lower response rate estimated at 50%) instead of 19% households who were expected to have completed the mandatory census long form.

Statistics Canada also offered simulation results for the estimated bias resulting from the shift to voluntary survey from mandatory Census.  In most instances, the estimated bias (before Statistics Canada applies mitigation strategies to address non-response bias) for most socio-demographics was much larger than the error of the estimate at 95% confidence level. Consider that under the voluntary survey, the Chinese population in Toronto is likely to be overestimated by 18% and Black residents to be underestimated by 13%. Simulations further showed that in Winnipeg, the voluntary survey may underestimate registered Indian population by 13%.

These biases will be even more dramatic in the case of smaller populations. Consider Bathurst, New Brunswick, where Statistics Canada simulations are suggesting that the voluntary survey will underestimate visible minorities by 43%. Such biases are unlikely to be systematic across Canada or within an urban centre, thus rendering the voluntary survey data of not much use for scientific inquiry.

While Statistics Canada has planned certain undisclosed mitigation strategies to off-set non-response bias, it warns not to read too much into such strategies since their effectiveness to offset “non-response bias and other quality limiting effects is largely unknown.”

The Census saga is just one example of the poor governance in Ottawa. While the Parliament is the supreme institution in the Canadian democracy representing the executive, it is however not wise for any Parliament to undermine judiciary or the civil service. By refusing to implement orders of the apex court (e.g., Khadr trial) or by ignoring the advice of government’s own experts (e.g., chief statistician), Prime Minister’s Harper’s government resembles more a dictatorship than a democracy.

The frustration of the 23,000 scientists and technicians working for the federal government has already spilled on to the Internet after the scientists launched the website www.publicscience.ca to express their dismay over the Tory government’s contempt for (and refusal to listen to) the expert advice.

I wonder if the (social) scientists, engineers, academics, and the civil society in Canada can stay on the sidelines and watch the Conservatives dismantle our democracy one institution at a time. Or is it the time to mount an opposition to the foolhardy governance in Ottawa. I am afraid we can no longer rely on the divided political opposition, which will end up splitting the opposition vote and thus allowing Tories to wreak havoc for another four years.

The time to act is now.

---

http://www.statcan.gc.ca/survey-enquete/household-menages/nhs-enm-eng.htm

National Household Survey: data quality

The National Household Survey (NHS) contains all of the questions that Statistics Canada contemplated for inclusion in a 2011 Census long-form. The NHS is therefore identical in content to what would have been collected in a 2011 Census long-form.

Data quality

Response rates

In its initial planning, Statistics Canada assumed a response rate for a mandatory 2011 Census long-form of 94%, identical to that achieved for the 2006 Census.

Statistics Canada has assumed a response rate of 50% for the voluntary National Household Survey.

Sample size

In its initial planning, Statistics Canada assumed a sample of one in five households for a mandatory 2011 census long-form, identical to that for the 2006 Census.

Statistics Canada, in consultation with the Minister, has fixed the sampling rate for the National Household Survey at one in three households, a 65% increase relative to the initial plan.

Sampling error

Like the previous long-form census, the objective of the National Household Survey is to produce accurate estimates from the questions asked for a wide variety of geographic areas ranging from very large (such as provinces and census metropolitan areas) to very small (such as neighbourhoods and municipalities) and for various population subgroups such as aboriginal peoples and immigrants. Such population subgroups will also range in size, in particular when cross-classified by geographic areas. These groupings are generally referred to as “domains of interest”.

For any given domain of interest, assuming random sampling, the sampling error is driven by three factors: the size of the population, the number of survey respondents and the variability of the variables being measured. Amongst these, only the number of survey respondents can be influenced by the survey process. People are familiar with the notion of sampling error through statements in opinion polls about results being “accurate within plus or minus x%, 19 times out of 20”. The larger the number of respondents, the smaller the value of x will be and therefore the more accurate the survey estimates will be.

With a sampling rate of 1 in 3 and an anticipated response rate of 50%, approximately 16% of the Canadian population will complete the National Household Survey, compared with 19% under a mandatory census long form (i.e., sampling rate of 1 in 5 and a 94% response rate). Given its anticipated lower overall number of respondents, the National Household Survey will, in general over all domains of interest, have a sampling error that is slightly higher (worse) than would have been achieved from a mandatory long-form census. Furthermore, it is expected that the quality of estimates across domains will present more variability, with some areas potentially achieving lower sampling errors than would have been achieved through a mandatory long-form census (because of the higher sampling rate of 33%), while other areas may see substantially higher sampling errors (because of unusually low response rates on the voluntary survey). Smaller domains of interest are particularly at risk of such fluctuations.

The annex to this note provides actual confidence intervals (i.e. plus or minus x%) from the 2006 Census for various variables for the Toronto Census Metropolitan Area, the Winnipeg Census Metropolitan Area and the Bathurst Census Agglomeration (New Brunswick). Provided for comparison are simulated estimates and their corresponding confidence intervals for the National Household Survey based on a 50% response rate.

Non-sampling error

Besides sampling, there are many factors that can introduce errors in survey results. Examples include respondent mistakes, interviewer effects, data collection methodology as well as data capture and processing errors. The move to a voluntary National Household Survey will have little impact on some of these factors (such as data capture and processing errors) but the effect on the other error sources is unknown and impossible to quantify. 

However, it is believed that the most significant source of non-sampling error for the National Household Survey will be non-response bias. All surveys are subject to non-response bias, even a Census with a 98% response rate. The risk of non-response bias quickly increases as the response rate declines. This is because, in general, non-respondents tend to have characteristics that are different then those of the respondents and thus the results are not representative of the true population. Given that the National Household Survey is anticipated to achieve a response rate of only 50% there is a substantial risk of non-response bias.

Statistics Canada is very much aware of these risks and their associated adverse effects on data quality. The Agency is currently adapting its data collection and other procedures to mitigate as much as possible against these risks. In particular, we will be using data on response patterns from the 2006 Census and information generated during data collection in 2011 to guide our field follow-up effort to minimize non-response bias. As well, where possible, 2011 Census data will be used as auxiliary information in National Household Survey estimation procedures to partially offset some of the remaining biases. However there is certain to be some residual, significant bias that will be impossible to measure and correct.

To give some appreciation of the potential for non-response bias prior to the implementation of any mitigating strategies, a simulation has been conducted for three geographic areas using the 2006 Census. The simulation compares actual 2006 Census long-form data1 to estimates based on the assumption that 16% rather than 19% of the population responded for selected variables from the Toronto Census Metropolitan Area, the Winnipeg Census Metropolitan Area and the Bathurst Census Agglomeration (New Brunswick). Using this, and similar, information, Statistics Canada will plan its field operations to minimize, to the extent possible, the potential for non-response bias.

Comparability of data over time

Any significant change in the methods of a survey can affect the comparability of data over time. There is a real risk that this will be the case for the National Household Survey. There will always and inevitably be an element of uncertainty as to whether and to what extent a change in a variable reflects real change or an artefact arising from the change in methodology from the mandatory long-form census to the voluntary National Household Survey.

Change in survey processes, however, is inevitable and has precedents even in the Census of Population. In 1971, for example, two major changes were introduced: selfenumeration in the place of interviewer enumeration and asking some questions of a subsample (then 1/3) of the population rather than the entire population (there had been some sampling in previous censuses, beginning in 1941, on a much more limited scale).

Conclusion

We have never previously conducted a survey on the scale of the voluntary National Household Survey, nor are we aware of any other country that has. The new methodology has been introduced relatively rapidly with limited testing. The effectiveness of our mitigation strategies to offset non-response bias and other quality limiting effects is largely unknown. For these reasons, it is difficult to anticipate the quality level of the final outcome.

The significance of any quality shortcomings depends, to some extent, on the intended use of the data. Given that, and our mitigation strategies, we are confident that the National Household Survey will produce usable and useful data that will meet the needs of many users. It will not, however, provide a level of quality that would have been achieved through a mandatory long-form census.

Annex

The following tables are intended to assist readers in understanding quality issues around the National Household Survey by providing some quantitative indicators developed from 2006 Census data.

The following provides a guide to reading the first line of the Toronto Census Metropolitan Area table. Other lines are read analogously for all three tables.

The variable of interest in this line is the total income in 2005 of the population of the Toronto CMA aged 15 years and over. More specifically, the first line looks at the number of persons in this age group with incomes under $1000 or with no income. The estimated number of such persons from the 2006 Census was 435,580. Based on the actual data, the 95% confidence interval around this estimate (since the long-form census was a sample survey) was plus or minus 0.4%. Assuming that the actual response rate had been 50%, which is the working assumption for the National Household Survey, the 95% confidence interval around the corresponding simulated NHS estimate would be plus or minus 0.5%.

Continuing with the first line of the Toronto CMA table, the final column reports results from the simulation of non-response bias for this income group in the absence of mitigation strategies. For this income class, this simulation indicates that the size of population would be underestimated by 4.4% relative to the 2006 Census estimate.

In some instances in the tables, the estimated bias is smaller than the error of estimate at the 95% level of confidence for the 2006 Census. In these instances, one cannot conclude with confidence that the bias exists.

2006 Census (long-form) compared to 2006 simulated NHS — CMA Toronto.

Estimated total population: 5,061,815
Number of census respondents (long-form): 974,435
Estimated Number of NHS respondents: 728,340

  2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over          
Under $1,000 or Without Income 435,580 0.40% 416,415 0.50% -4.40%
$50,000 and over 966,405 0.40% 1,015,780 0.50% 5.10%
Total population 25 to 64 years by highest certificate, diploma or degree          
High school or less 982,800 0.40% 945,150 0.50% -3.80%
College/Cegep 534,020 0.60% 529,140 0.70% -0.90%
University certificate, diploma or degree - Bachelor and above 962,175 0.40% 1,002,620 0.50% 4.20%
Total labour force 15 years and over 2,815,845 0.20% 2,821,480 0.20% 0.20%
Total labour force 15 years and over by industry          
23 Construction 148,895 1.20% 134,960 1.50% -9.40%
91 Public administration 94,195 1.60% 101,295 1.80% 7.50%
Total labour force 15 years and over by occupation          
A Management occupations 320,600 0.80% 320,305 0.90% -0.10%
B Business, finance and administration occupations 590,605 0.60% 614,430 0.60% 4.00%
D Health occupations 124,080 1.30% 123,300 1.50% -0.60%
G Sales and service occupations 611,410 0.50% 594,170 0.70% -2.80%
H Trades, transport and equipment operators and related occupations 327,850 0.80% 302,840 0.90% -7.60%
Total visible minority population 2,174,065 0.40% 2,131,405 0.50% -2.00%
Chinese 486,330 1.10% 572,040 1.20% 17.60%
Black 352,220 1.30% 305,895 1.60% -13.20%
Total population by citizenship          
Citizenship other than Canadian 642,130 0.80% 606,050 0.90% -5.60%
Population by Immigrant Status          
Immigrants 2,320,165 0.20% 2,274,450 0.30% -2.00%
Total population by Aboriginal and non-Aboriginal identity          
Total Aboriginal identity population 26,575 3.40% 25,000 4.10% -5.90%
Registered Indian status          
Registered Indian 9,950 3.90% 8,790 4.90% -11.70%
Mobility 1 year          
Moved 612,130 0.90% 575,780 1.10% -5.90%
Not moved 4,459,945 0.10% 4,496,295 0.10% 0.80%

 

2006 Census (long-form) compared to 2006 simulated NHS — CMA Winnipeg
Estimated total population: 681,815
Number of census respondents (long-form): 132,155
Estimated number of NHS respondents: 96,735
  2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over          
Under $1,000 or Without Income 41,590 1.40% 39,715 1.60% -4.50%
$50,000 and over 104,420 1.30% 107,995 1.50% 3.40%
Total population 25 to 64 years by highest certificate, diploma or degree -          
High school or less 152,670 1.00% 149,180 1.20% -2.30%
College/Cegep 73,235 1.50% 73,435 1.80% 0.30%
University certificate, diploma or degree - Bachelor and above 90,535 1.40% 92,840 1.60% 2.50%
Total labour force 15 years and over 385,870 0.50% 385,360 0.60% -0.10%
Total labour force 15 years and over by industry          
23 Construction 18,780 3.50% 17,070 4.30% -9.10%
91 Public administration 27,105 2.90% 27,830 3.30% 2.70%
Total labour force 15 years and over by occupation          
A Management occupations 35,480 2.40% 34,810 2.80% -1.90%
B Business, finance and administration occupations 76,155 1.60% 79,225 1.80% 4.00%
D Health occupations 25,885 2.80% 26,475 3.30% 2.30%
G Sales and service occupations 95,180 1.40% 93,505 1.60% -1.80%
H Trades, transport and equipment operators and related occupations 51,715 1.90% 49,105 2.40% -5.00%
Total visible minority population 102,945 2.30% 99,340 2.80% -3.50%
Chinese 12,810 7.00% 12,245 8.50% -4.40%
Black 14,470 6.60% 13,845 8.00% -4.30%
Total population by citizenship          
Citizenship other than Canadian 37,545 3.30% 35,770 4.00% -4.70%
Population by Immigrant Status          
Immigrants 121,255 1.30% 117,870 1.60% -2.80%
Total population by Aboriginal and non-Aboriginal identity          
Total Aboriginal identity population 68,385 2.00% 63,845 2.50% -6.60%
Registered Indian status          
Registered Indian 26,610 2.40% 23,225 3.00% -12.70%
Mobility 1 year          
Moved 91,060 2.20% 85,395 2.80% -6.20%
Not moved 594,975 0.30% 600,640 0.30% 1.00%

 

2006 Census (long-form) compared to 2006 simulated NHS — Bathurst
 
Estimated total population: 30,750
Number of census respondents (long-form): 5,910
Estimated number of NHS respondents: 4,280

 

  2006 Census Estimate +/- % 2006 Census Simulated NHS Estimate +/- % 2006 NHS Estimated Bias (%)
Total income in 2005 of population 15 years and over          
Under $1,000 or Without Income 2,105 6.00% 1,995 7.40% -5.20%
$50,000 and over 3,805 6.70% 4,040 7.70% 6.20%
Total population 25 to 64 years by highest certificate, diploma or degree -          
High school or less 8,425 4.10% 8,110 5.00% -3.70%
College/Cegep 4,075 6.40% 4,205 7.50% 3.20%
University certificate, diploma or degree - Bachelor and above 2,360 8.70% 2,450 10.20% 3.80%
Total labour force 15 years and over 15,830 2.60% 15,625 3.20% -1.30%
Total labour force 15 years and over by industry          
23 Construction 795 16.90% 805 20.10% 1.30%
91 Public administration 1,465 12.30% 1,535 14.40% 4.80%
Total labour force 15 years and over by occupation          
A Management occupations 1,145 13.30% 975 17.30% -14.80%
B Business, finance and administration occupations 2,680 8.50% 2,695 10.10% 0.60%
D Health occupations 1,410 11.90% 1,610 13.30% 14.20%
G Sales and service occupations 4,310 6.50% 4,445 7.60% 3.10%
H Trades, transport and equipment operators and related occupations 2,635 8.50% 2,525 10.50% -4.20%
Total visible minority population 300 45.90% 170 73.20% -43.30%
Chinese 40 126.40% 10 302.50% -75.00%
Black 115 74.40% 60 123.40% -47.80%
Total population by citizenship          
Citizenship other than Canadian 100 65.40% 130 68.60% 30.00%
Population by Immigrant Status          
Immigrants 475 23.20% 475 27.80% 0.00%
Total population by Aboriginal and non-Aboriginal identity          
Total Aboriginal identity population 440 26.20% 505 29.20% 14.80%
Registered Indian status          
Registered Indian 185 28.70% 160 37.00% -13.50%
Mobility 1 year          
Moved 3,235 12.10% 2,630 16.30% -18.70%
Not moved 27,695 1.20% 28,300 1.20% 2.20%

 

Note:

  1. For the purposes of the simulation, the 2006 Census estimates have been assumed to be the “true” population values.  It should be noted, however, that the 2006 Census estimates are themselves subject both to sampling error and to response bias as they are based on a sample of 1 in 5 households.
Enhanced by Zemanta

No comments:

Post a Comment