A Study in Cell Phones

A recent story in the New York Times, “More Demands on Cell Carriers in Surveillance,” describes the response by cellular service providers to an inquiry from Rep. Edward J. Markey, in which the service providers reported on the frequency of requests for subscriber information by law enforcement agencies. In 2011, carriers responded to more than 1.3 million requests for subscriber information (anything from billing address to geolocational information based on GPS and cell tower hits).

Law enforcement requests for information from telecommunications providers are certainly not new and are frequently an important tool in the effort to track and apprehend criminals. Mobile communications information is particularly valuable given that, as one detective quoted in the article points out, “At every crime scene, there’s some type of mobile device.” The ability to gather more information related to these mobile devices may yield more clues to help solve a crime. On the flip side, more of this kind of information (combined with more analytic power) may also increase the potential for privacy and civil liberties violations. Individuals may be improperly scrutinized or more detailed inferences that are irrelevant to the investigation may be made about an individual’s private life.

Beyond this larger question, however, the article offers us great examples of several of the privacy and civil liberties issues that are common throughout the data analytics world.

“Sprint and other carriers called on Congress to set clearer legal standards for turning over location data, particularly to resolve contradictions in the law.”

As noted in an earlier post on this blog, the law often lags significantly behind the pace of technological development. As new means of communication develop and new information can be generated from those means (e.g., geolocational data), law enforcement is forced to try to figure out just how a law written in 1986 should apply to 2012 technology. The end result is usually confusion, with both law enforcement and the cellular carriers unsure of how to comply with the law. Explicit guidance is likely to come months or years later in the form of a court decision—if at all.

“When a police agency asks for a cell tower ‘dump’ for data on subscribers who were near a tower during a certain period of time, it may get back hundreds or even thousands of names.”

The imprecise handling of data often results in too much information being produced in response to a request (see, e.g., the overproduction of information in response to National Security Letters). This is an issue that can often largely be addressed through technical means. Just because a high volume of data exists does not mean access to it cannot be controlled with precision. Complex searches can be conducted to return only the information directly relevant to a particular investigation. Personally identifiable information can be protected so that analysis can be conducted without revealing identifying information until a reasonable evidentiary threshold has been crossed. In short, it should now be possible to sift through data and pick out relevant threads of information without ever exposing the rest of the data set to human eyes. Ideally, this granular sifting of information would occur before the data is passed to law enforcement. However, accompanied by a credible and transparent review process and oversight regime, this data control could allow for post-sharing filtering that could reassure the public that information is being protected to the greatest extent possible.

“Because of incomplete record-keeping, the total number of law enforcement requests last year was almost certainly much higher than the 1.3 million the carriers reported….”

Information on how data is used is often lacking, but modern information systems increasingly generate “data about data” that should make it easier for data stewards to provide information about what they store, use, and share. Generating and analyzing this data can provide critical information that can not only be used to better protect privacy and civil liberties but can also lead to better, more efficient analysis. What information is being shared with whom? How often does a certain type of data get used? How often does data of a certain age get used? How successful has a particular type of analysis been? Answering these questions can lead to better data handling policy as well as help redirect analytic resources along more effective lines.

“Chris Calabrese, a lawyer for the A.C.L.U., said he was concerned… about the agencies then keeping those records indefinitely in internal databases.”

When is information no longer worth keeping? How valuable is a ten year-old piece of data when there is a possibility that more sophisticated analysis might—just might—unearth the first link in the chain of evidence that could solve a serious crime or prevent a terrorist attack? Alternatively, how much is an individual damaged when personal but non-criminal information about him or her is held in a government database, potentially leading to significant stigmatization if the individual’s presence in that database is revealed? What looks like a simple cost-benefit analysis is made more complicated by uncertainty over the value of the information being retained. The kinds of metadata and metrics discussed above might go a long way toward providing meaningful quantitative information about how often information of a certain age from certain sources is actually used, thus contributing to an informed, reasonable retention policy.

Volumes can be and have been written on each of these issues, and we will frequently return to them in the pages of this blog, including some looks at how Palantir can address these issues in a way that mitigates concerns about privacy and civil liberties. We believe that none of the issues described here is insurmountable. These challenges can be addressed through close engagement between policymakers and technologists, thus allowing law and policy to be informed by technical feasibility and new technology to be developed from an early stage with policy needs in mind.