I'm currently taking a class in data privacy, which is about assessing privacy r...

drakaal · on April 24, 2013

Those formula's assume the creator knows all the variables. I have data on things like Probablity a First name is of which race. When names started to appear for determining maximum Age. Search data for identifying people who were worried they had a disease.

Often a single piece of data will cause the intersection of two traits to be 80% accurate. You can't mathmatically predict these things because you don't know how many factors I have.

There are zip codes where only 1 in 100 people are of a given race. There are diseases that only 1 in 100k people have done a search for and only 1 in 500k people have. These kinds of things you can't formulaize.

lightcatcher · on April 25, 2013

This is not true at all. Differential privacy works to counteract EXACTLY these types of attacks. There are some very basic proofs that the privacy guarantees of a differential private release mechanism is not at all weakened by external data, such as what you've listed. Differential privacy attempts to minimize the risk of being a row in a database given that the attacker knows the values of every other row (and has whatever external data he wants).