|
|
Although SDC does not appear in the top 5 user output priorities, it is an important area of concern for many users. Some specific areas of interest are how SDC will be implemented for 2011, the scope and depth of the effect that this might have on the data, and any resulting considerations for the use and interpretation of the data.
The ONS website contains a summary of
Knowledge of SDC Question How do you rate your knowledge of SDC methods that could be applied to Census tables and their effects? (Tick one only)
At first glance this question seems to have produced some surprising results. A high proportion of respondents (23.5 per cent) indicated that they were not previously aware that SDC methodology of any kind was applied to Census data. A further 35 per cent indicated that, although they are aware that SDC is applied, and that it can be applied pre or post tabulation, or both, they do not have an understanding of the different types of SDC methodologies and the different effects each might have. Only 15 per cent say that they have both a good understanding of SDC methods and the ability to take the effects of SDC into account in any analyses they conduct (given sufficient supplementary information). We might expect individual user responses to be influenced by the user sector they are from, so if we look deeper, there may be some user sector bias influencing results and unbalancing the representation of the overall level of user SDC knowledge.
* Third sector - Community Group / Voluntary Sector and Charities
As we might expect, members of the public (no organisation) and users in the third sector are the most likely to say that they were not previously aware that SDC was applied to Census tables, (50 per cent and 44 per cent respectively). Commercial, academic, central and regional government users are more confident, not only that they understand SDC methods, but that they can take SDC effects into account in their analyses. Generally, however, there appears to be as much variation in the levels of SDC knowledge within many of the user groups as across them. SDC methodology The questionnaire sought to gather user opinions of SDC policy by asking users to rank a number of qualitative SDC characteristics in terms of their potential effects on output. Users were asked to …. Question Rank the following SDC features in order of importance, 1 to 7 (If additivity is the most important feature to you rank it as 1. For the second most important, rank it as 2 and so on)
The SDC feature that received the highest average user ranking was the need for the data to remain as accurate as possible. The two next most important features were table additivity (that tables add up internally to column and row totals) and consistency (that cell counts are consistent across different tables). When broken down by user type we do not see a great deal of variation in the results. Accuracy is consistently ranked as the most important feature by respondents from across all the different user communities. Consistency and additivity are the second and third most important features for academic, local government and commercial users. An SDC method that is easy to understand is ranked highly for members of the public and users from the third sector. Any room for compromise? Question Should the Census Offices aim to produce counts that are as near as possible to true counts (within the constraints of legal and policy confidentiality requirements) or would you be prepared to accept less accurate data to ensure particular characteristics (e.g. additive tables) i) I want most accurate data possible regardless of other considerations ii) I would be prepared to sacrifice some accuracy to have particular characteristics (detailed below) Given the strong ranking for accuracy in results from the previous question, it is not surprising that overall only 27 per cent of users who responded are prepared to sacrifice a degree of accuracy. Of those that are willing to make a trade off, the characteristics that are most preferred are additivity and consistency. When broken down by user sector, the single type of users most likely to accept any trade off, those in the commercial sector, also agree with the overall user preference for additivity and consistency.
SDC is a very important area, and we will return to this topic in future blog posts to address other questions, for example relating to SDC and licensing. The questionnaire sought only to capture qualitative responses and rankings, with minimal scope for open form answers. We are always happy to hear more detailed user thoughts about SDC principles, applications, and implications, and encourage further open discussion here of in the forum. As the SDC research and selection process for 2011 is developed and finalised, we will be reporting on the decisions made and outlining the potential affects on output. We will also seek user thoughts and feedback to help us provide the best information possible that helps users to understand what it means for the data and your use of it, and how the effects can be managed. «« Previous post Online output questionnaire results VII: Output priorities See other posts similar to this one: |
Scotland’s Spring 2010 Census Consultation: Statistical Outputs, created: 1266486505|%O ago, 0 response(s) 2011 Output Consultation - Main Statistical Outputs, created: 1260870597|%O ago, 0 response(s) Autumn 2009 output consultation events, created: 1256552153|%O ago, 0 response(s) |
Online output questionnaire results VIII: Statistical Disclosure Control (SDC)
Originally posted by BLine 3 Mar 2009.
We try to use the detailed OA to OA origin :destination tables to estimate 'self containment ' patterns for migrations and travel to work patterns, to help in identifying funcitonal housing market areas. At this scale pretty well every cell is '3', which potentialy causes a lot of distortion. Common sense would suggest that longer distance moves/commutes are likely to be smaller numbers.
But if changing the disclosure rules looks like an unwinnable battle, what would help is some work to show how much distortion it actually does cause. I've experimented by changing all '3' cells to 1 or 2, to see the effect, and it is noticeable. But a check against the real numbers at different distances could perhaps give an average variation that could be applied.
The small cell adjustment (SCA) method was used to protect the confidentiality of individual respondents in 2001 Census data and does have significant effects on the utility of some tables.
The origin destination tables, especially at low geographies, will be greatly affected.
The method itself should be unbiased, so that a total of 50 people travelling from OA1 to OA2 would have an expected value of 50, but there is significant variation between the published total (which is a sum of protected counts) either above or below the unperturbed total.
The variation can be seen for OA to OA counts broken down by different variables. For example, using a breakdown by age and sex could give a total of 25, a breakdown by economic activity could give a total of 33. By taking all the different published breakdowns and averaging the totals, one gets a still unbiased estimate and with far less variability from the unprotected count.
For the 2011 Census, we have already proposed that origin destination tables at low geographies are protected not by SDC methods, but by licensing and access arrangements. The precise details of this approach have not yet been finalised.
I would also add that whilst the OA to OA has issues when used with very small counts that it does have uses. It is especially valuable if then being aggregated to higher levels of geography where this disturbance is then minimised.
In 2011 this is worth remembering when whatever arrangements are put into place, otherwise a useful dataset may lose it's value if access is unduely restricted.
taking all the different published breakdowns and averaging the totals sounds like a lot of work to get a more unbiased result. But the proposed licencing and access arrangments should improve things considerably. I'm hoping to further develop a bit of MapBasic software to batch generate self containment convex hulls, derived from 'Range Manager' http://solutionsgroup.tripod.com/rangeman.htm but adapted , maybe renamed as 'Migrations Mapper' or something.