Hacker News new | past | comments | ask | show | jobs | submit login

I'm currently taking a class in data privacy, which is about assessing privacy risks using a mathematically rigorous formalization (differential privacy, see http://en.wikipedia.org/wiki/Differential_privacy ) and developing algorithms to allow for minimal risk release of private databases and avoid attacks like yours above. Your post makes it seem like you might enjoy reading more about the field; you should check out this survey of the field: http://www.cs.ucdavis.edu/~franklin/ecs289/2010/dwork_2008.p...



Those formula's assume the creator knows all the variables. I have data on things like Probablity a First name is of which race. When names started to appear for determining maximum Age. Search data for identifying people who were worried they had a disease.

Often a single piece of data will cause the intersection of two traits to be 80% accurate. You can't mathmatically predict these things because you don't know how many factors I have.

There are zip codes where only 1 in 100 people are of a given race. There are diseases that only 1 in 100k people have done a search for and only 1 in 500k people have. These kinds of things you can't formulaize.


This is not true at all. Differential privacy works to counteract EXACTLY these types of attacks. There are some very basic proofs that the privacy guarantees of a differential private release mechanism is not at all weakened by external data, such as what you've listed. Differential privacy attempts to minimize the risk of being a row in a database given that the attacker knows the values of every other row (and has whatever external data he wants).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: