I often ask candidates if they’re familiar with what we do at Palantir. Most people think they are. “Oh, you’re that data viz. company,” or, worse, “You guys do data mining, right?” At least they’ve heard of us and at least they’re on the right track, but I cringe anyway. We aren’t just a “data visualization” company and we don’t do “data mining.” It’s almost impossible to convey the scope and complexity of what we do in a few short minutes—or to do so without taking the conversation to an eye-glazing level of abstraction.
The following is my attempt at describing what we do at a high level without oversimplifying. I hope that after reading this a candidate will ‘get’ what we’re about, or at least understand enough not to apply tiny labels to our expansive vision.
The problem: implementing analysis
At Palantir we specialize in analysis.
Yes, that’s painfully abstract, and I’ll get to it in a second.
In real-world terms, we are building a software platform that enables people to take whatever data is relevant to them and understand it more easily and thoroughly than ever before, using concepts that they already understand. And we are applying this vision, at first, to solving problems in the finance sector and the government intelligence community.
The first important thing to note is that we don’t actually do the analysis ourselves. We don’t devise winning trading strategies and we don’t catch terrorists. We write software that enables other people to pull off these feats. These people, experts in their respective fields, are called analysts.
So what exactly do analysts do? What is analysis?
Analysis is everything necessary to extract insight from information.
Let’s break that down a bit.
Information is easy: It’s data. It lives in a relational database or as files indexed on a hard drive, and you can easily run queries against it. It comes in two forms, structured and unstructured. And there is a lot of it in the modern world – too much, actually, for current tools to make sense of.
Insight is trickier. Insight is something only a person can generate, and understanding this is critical for any organization that wants to do analysis right. Thus the challenge of data analysis is how to bring vast amounts of information into productive contact with human intelligence. In other words, the challenge is how to enable the analyst.
From the analyst’s perspective there are five essential features of an analysis platform:
- First, and most important, the analyst should be in control. In other words, the primary way of interacting with an analysis tool should be human-driven queries. While automated approaches can complement a human-driven approach, there simply is no substitute for human intelligence. Unless you put a person behind the wheel, the system can never be flexible or creative enough to uncover truly original insight. Artificial Intelligence just isn’t there yet.
- Ability to summarize large data sets. Some of this is what has traditionally been called data mining: the largely automated approach—using machine learning or other statistical techniques—of processing lots of data at once and extracting nuggets that capture something interesting about the data. Unlike Palantir, traditional approaches have focused almost exclusively on this aspect of analysis.
- Ability to visualize large data sets. Here the analyst wants interesting and informative ways of viewing data graphically, to make it easier for him to digest. The analyst wants more than just a summary of the data; he wants a nuanced view of what’s going on inside these data sets: What’s the overall shape of the distribution? What are the outliers? What are important structures within the data?
- Ability to iterate rapidly. This means enabling the analyst to ask a question, get the answer, and then quickly ask either a variant on the initial question or a follow-up question that depends on the answer to the initial question. This rapid, iterative process allows the analyst to quickly test out hypotheses and develop theories about what’s going on in the data, and by extension to discover what’s going on in the world.
- Ability to collaborate with other analysts. Getting a handle on a terabyte of data, especially when it comprises multiple data types, is definitely more than a one-person job. Any organization that’s serious about understanding the world needs a team of analysts that can work together as more than the sum of its parts. This requires the ability for one analyst to effortlessly share the results of his analysis with his colleagues.
The Palantir approach
That’s what analysis looks like to the analyst, or rather what it should look like in an ideal world. (Current tools fall far short of this vision.) So what do we do at Palantir in order to make analysis this smooth and easy?
You could say that we help summarize large data sets, in the sense that we have to provide the analyst with a rich library of techniques and algorithms. You could also say that we do visualization, in the sense that we have to provide the analyst with a set of interesting and informative ways of visualizing their data. We do both of these things, and we have to be creative and solve hard problems in order to add value in these areas. But we do a lot more than that.
Probably the most central hard problem that we address in trying to enable the analyst is data modeling, the process of figuring out what data types are relevant to a domain, defining what they represent in the world, and deciding how to represent them in the system. At Palantir we make sure our data model (ontology) is both flexible and dynamic, and that it mirrors the concepts people naturally use when reasoning about the domain. This is no small challenge, but we’re already making it a reality. In finance our basic data types include financial instruments, dates, portfolios, indices, and strategies—the same things that financial researchers think about, talk about, and reason with. In the intelligence product our basic data types include people, places, and events (all with associated properties), which is exactly the way we all represent the world in our minds.
Data modeling, data summarization, and data visualization are the core disciplines for approaching large data sets. Human-driven queries, rapid iteration, and collaboration are multipliers, taking the power unlocked by the core disciplines to the next level. When these pieces are brought together in a coherent system, the result is in an analysis platform both very generic and very powerful.
This is what we mean when we say that we’re changing the way people approach data. Welcome to the future of analysis.