2009-09-08

High SNR Sentences: Identifying You in Data

It was found that 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}.
The sentence was from a CMU researcher working in 2000 on 1990 census data. Unfortunately the paper is behind a wall, but a different, available paper with a decent methodology puts the number at 63%. I was not comforted by the lower estimate. Tip of the hat to Ars Technica for an interesting discussion on "anonymizing" data.

1 comment:

Jenn said...

It's interesting to think how often I give up at least some of those pieces of information- random surveys often ask for you DOB and gender. It feels like that should be fairly anonymous- though I'm not sure if it's good to base that on a "feeling."

Anyway, rambling. Meeting went well today, with some interesting overtones. Prof definitely showed interest in working my area. Sigh.