A June 1 article in The New York Times highlights some of the issues we are concerned with here at Big Data and the Law. The article is about what Kate Crawford, a researcher at Microsoft Research, calls the “Six Myths of Big Data.”
In this post we’re concerned about this one:
Myth 5: Big Data Is Anonymous
A study published in Nature last March looked at 1.5 million phone records that had personally identifying information removed. It found that just four data points of when and where a call was made could identify 95 percent of individuals. “With just two, you can identify 50 percent of them,” Ms. Crawford said. “With a fingerprint, you need 12 data points to identify somebody.” Likewise, smart grids can spot when your friends come over. Search engine queries can yield health data that would be protected if it came up in a doctor’s office.
So there’s a couple of things here. First, what’s the truth about de-identifying, anonymizing and the like? Can we get a definitive answer about whether it’s possible to scrub information in the way we say we do in our privacy policies? We need some answers people. Work with me on this.
Moving on, look at the last sentence most carefully. This is an issue that isn’t talked about nearly enough.
For quite a while regulators have been paying attention to how the personal information we share might be disclosed or used to the detriment of our privacy. In so doing, regulators identify certain information that needs to be protected. (Remember, we don’t call such information personally identifiable information here – because, as previously noted, that term doesn’t make sense.)
For example, let’s say that the Federal Trade Commission designates certain information about us (we’ll call it Data X) as personal information that should not be disclosed by those receiving it. Let’s further assume that you can’t even give permission for your Data X to be disclosed. Your Data X privacy concerns are now addressed when you disclose your Data X to someone.
But what about Data X that is discovered by someone rather than received from you?
If you take Ms. Crawford’s statement as true, disclosure is not the only way your Data X might get into the hands of other people. What if, to take our example further, Smith Data Company can discover your Data X through the analysis of information that is publicly available? Is it protected then? Protected from what? From use by Smith Data Company? From disclosure by Smith Data Company?
That’s a problem. Regulators need to start thinking about the discovery of Data X, as well the use and disclosure of Data X by the discoverer. Smith Data Company has no business being in the possession of your Data X in the first place.
The article closes with this:
Before Big Data disappears into the background as another fact of life, Ms. Crawford said, “We need to think about how we will navigate these systems. Not just individually, but as a society.”
That’s an understatement. It’s also an apt description of our tendency to throw up our arms and say it’s all out of our control.