Online output questionnaire results VIII: Statistical Disclosure Control (SDC)


Although SDC does not appear in the top 5 user output priorities, it is an important area of concern for many users. Some specific areas of interest are how SDC will be implemented for 2011, the scope and depth of the effect that this might have on the data, and any resulting considerations for the use and interpretation of the data.

The ONS website contains a summary of

  • the reviews made of 2001 Census SDC policy
  • the completed initial stages of user consultation and research for SDC plans for the 2011 Census
  • a timetable and plans for the ongoing process of research, consultation, reporting and communication, and selection of final SDC policy.

Knowledge of SDC
The online output questionnaire asked several questions relating to SDC, so, let’s start by considering what users mean by or understand as being SDC. The questionnaire asked ….

Question How do you rate your knowledge of SDC methods that could be applied to Census tables and their effects? (Tick one only)

Question responses: 497
Not previously aware that SDC was applied to Census tables 117 (23.5%)
Aware that SDC is applied to tables, can be applied pre-tabular or post-tabular, but do not understand the effects of
different methods
175 (35.2%)
Have some understanding of both pre-tabular and post-tabular SDC methods and their effects 133 (26.7%)
Understand both pre-tabular and post tabular methods and their effects, and with sufficient information would be able to take these into account in any analyses conducted 72 (14.5%)

At first glance this question seems to have produced some surprising results. A high proportion of respondents (23.5 per cent) indicated that they were not previously aware that SDC methodology of any kind was applied to Census data. A further 35 per cent indicated that, although they are aware that SDC is applied, and that it can be applied pre or post tabulation, or both, they do not have an understanding of the different types of SDC methodologies and the different effects each might have.

Only 15 per cent say that they have both a good understanding of SDC methods and the ability to take the effects of SDC into account in any analyses they conduct (given sufficient supplementary information).

We might expect individual user responses to be influenced by the user sector they are from, so if we look deeper, there may be some user sector bias influencing results and unbalancing the representation of the overall level of user SDC knowledge.

Question responses: 497
Percentage
- Not previously aware that SDC was applied to Census tables Aware that SDC is applied to tables, pre-tabular or post-tabular, but do not understand the different methods Have some understanding of both pre-tabular and post-tabular SDC methods and their effects Understand both methods and their effects, and would be able to take these into account in analyses conducted
No organisation 50.0 43.7 0.0 6.3
Central Government 21.1 26.3 31.5 21.1
Regional Government 14.3 42.9 21.4 21.4
Local Government 19.7 36.0 30.0 14.3
Academic 23.2 26.8 28.6 21.4
Health 20.0 40.0 28.9 11.1
Commercial 14.3 36.7 24.5 24.5
Third Sector 44.2 37.2 16.3 2.3

* Third sector - Community Group / Voluntary Sector and Charities


sdc1.jpg

View full size chart

As we might expect, members of the public (no organisation) and users in the third sector are the most likely to say that they were not previously aware that SDC was applied to Census tables, (50 per cent and 44 per cent respectively).

Commercial, academic, central and regional government users are more confident, not only that they understand SDC methods, but that they can take SDC effects into account in their analyses. Generally, however, there appears to be as much variation in the levels of SDC knowledge within many of the user groups as across them.



SDC methodology
SDC methods are applied in order to ensure an acceptable level of data protection to meet either legal or policy requirements, and to foster public confidence by ensuring a degree of personal data confidentiality. However, SDC methods differ in the methodology and the way they are applied - each can vary greatly in terms of complexity, visibility, and effects on table characteristics and data quality.

The questionnaire sought to gather user opinions of SDC policy by asking users to rank a number of qualitative SDC characteristics in terms of their potential effects on output. Users were asked to ….

Question Rank the following SDC features in order of importance, 1 to 7 (If additivity is the most important feature to you rank it as 1. For the second most important, rank it as 2 and so on)

Question responses: 441
Percentage
- 1 2 3 4 5 6 7
Additivity (tables that add up internally to column and row totals) 16.3 14.0 47.3 11.5 7.0 3.9 3.4
Consistency (consistent cell counts across tables) 20.9 29.0 16.7 11.7 10.6 8.8 5.8
Accuracy (counts that are as near as possible to true counts) 20.2 23.7 16.7 10.7 14.2 9.9 7.1
An SDC method that is simple to understand 11.1 7.1 7.8 22.2 16.8 19.0 16.4
An SDC method that is visible, in other words looking at the tables you can see that the data has been protected 9.0 9.0 4.8 14.6 19.4 23.1 18.0
Being provided with the information on how SDC mmethods may impact on different analysis and how this impact can be taken into account 10.9 12.1 3.3 13.1 15.8 21.8 20.1
SDC method applied which can be taken into account in analysis 11.6 5.0 3.3 16.2 16.3 13.5 29.1

sdc2.jpg

View full size chart

The SDC feature that received the highest average user ranking was the need for the data to remain as accurate as possible.

The two next most important features were table additivity (that tables add up internally to column and row totals) and consistency (that cell counts are consistent across different tables).

When broken down by user type we do not see a great deal of variation in the results. Accuracy is consistently ranked as the most important feature by respondents from across all the different user communities.

Consistency and additivity are the second and third most important features for academic, local government and commercial users. An SDC method that is easy to understand is ranked highly for members of the public and users from the third sector.



Any room for compromise?
So, in aggregate, the most important characteristic for users was that an SDC methodology be as accurate as possible. But, how far are individual users prepared to sacrifice such accuracy in order to ensure or improve the presence of another favoured characteristic? What degree of trade off might be acceptable? To try and gauge the level of trade off that users might find acceptable, the questionnaire asked ….

Question Should the Census Offices aim to produce counts that are as near as possible to true counts (within the constraints of legal and policy confidentiality requirements) or would you be prepared to accept less accurate data to ensure particular characteristics (e.g. additive tables)

i) I want most accurate data possible regardless of other considerations

ii) I would be prepared to sacrifice some accuracy to have particular characteristics (detailed below)


Given the strong ranking for accuracy in results from the previous question, it is not surprising that overall only 27 per cent of users who responded are prepared to sacrifice a degree of accuracy.

Of those that are willing to make a trade off, the characteristics that are most preferred are additivity and consistency.

When broken down by user sector, the single type of users most likely to accept any trade off, those in the commercial sector, also agree with the overall user preference for additivity and consistency.

Question responses: 468
Percentage
- I want the most accurate data possible regardless of other considerations I would be prepared to sacrifice some accuracy to have particular characteristics
No organisation 92.9 7.1
Central Government 68.4 31.6
Regional Government 92.3 7.7
Local Government 75.6 24.4
Academic 74.0 26.0
Health Sector 81.0 19.0
Commercial Sector 58.3 41.7
Third Sector 67.6 32.4

sdc3.jpg

View full size chart



SDC is a very important area, and we will return to this topic in future blog posts to address other questions, for example relating to SDC and licensing.

The questionnaire sought only to capture qualitative responses and rankings, with minimal scope for open form answers. We are always happy to hear more detailed user thoughts about SDC principles, applications, and implications, and encourage further open discussion here of in the forum.

As the SDC research and selection process for 2011 is developed and finalised, we will be reporting on the decisions made and outlining the potential affects on output. We will also seek user thoughts and feedback to help us provide the best information possible that helps users to understand what it means for the data and your use of it, and how the effects can be managed.




See other posts similar to this one:


Disclosure control
ONS_BrendanONS_Brendan 1236166526|%e %b %Y, %H:%M %Z|agohover

Originally posted by BLine 3 Mar 2009.

We try to use the detailed OA to OA origin :destination tables to estimate 'self containment ' patterns for migrations and travel to work patterns, to help in identifying funcitonal housing market areas. At this scale pretty well every cell is '3', which potentialy causes a lot of distortion. Common sense would suggest that longer distance moves/commutes are likely to be smaller numbers.

But if changing the disclosure rules looks like an unwinnable battle, what would help is some work to show how much distortion it actually does cause. I've experimented by changing all '3' cells to 1 or 2, to see the effect, and it is noticeable. But a check against the real numbers at different distances could perhaps give an average variation that could be applied.

Last edited on 1236166552|%e %b %Y, %H:%M %Z|agohover By ONS_Brendan + Show more
Reply  |  Options
Unfold Disclosure control by ONS_BrendanONS_Brendan, 1236166526|%e %b %Y, %H:%M %Z|agohover
Re: Disclosure control
ONS_BrendanONS_Brendan 1236166958|%e %b %Y, %H:%M %Z|agohover

The small cell adjustment (SCA) method was used to protect the confidentiality of individual respondents in 2001 Census data and does have significant effects on the utility of some tables.

The origin destination tables, especially at low geographies, will be greatly affected.

The method itself should be unbiased, so that a total of 50 people travelling from OA1 to OA2 would have an expected value of 50, but there is significant variation between the published total (which is a sum of protected counts) either above or below the unperturbed total.

The variation can be seen for OA to OA counts broken down by different variables. For example, using a breakdown by age and sex could give a total of 25, a breakdown by economic activity could give a total of 33. By taking all the different published breakdowns and averaging the totals, one gets a still unbiased estimate and with far less variability from the unprotected count.

For the 2011 Census, we have already proposed that origin destination tables at low geographies are protected not by SDC methods, but by licensing and access arrangements. The precise details of this approach have not yet been finalised.

Reply  |  Options
Unfold Re: Disclosure control by ONS_BrendanONS_Brendan, 1236166958|%e %b %Y, %H:%M %Z|agohover
Re: Disclosure control
Richard PriceRichard Price 1236267498|%e %b %Y, %H:%M %Z|agohover

I would also add that whilst the OA to OA has issues when used with very small counts that it does have uses. It is especially valuable if then being aggregated to higher levels of geography where this disturbance is then minimised.

In 2011 this is worth remembering when whatever arrangements are put into place, otherwise a useful dataset may lose it's value if access is unduely restricted.

Reply  |  Options
Unfold Re: Disclosure control by Richard PriceRichard Price, 1236267498|%e %b %Y, %H:%M %Z|agohover
Re: Disclosure control
BLineBLine 1237118167|%e %b %Y, %H:%M %Z|agohover

taking all the different published breakdowns and averaging the totals sounds like a lot of work to get a more unbiased result. But the proposed licencing and access arrangments should improve things considerably. I'm hoping to further develop a bit of MapBasic software to batch generate self containment convex hulls, derived from 'Range Manager' http://solutionsgroup.tripod.com/rangeman.htm but adapted , maybe renamed as 'Migrations Mapper' or something.

Reply  |  Options
Unfold Re: Disclosure control by BLineBLine, 1237118167|%e %b %Y, %H:%M %Z|agohover
Add a New Comment


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License