When we first started developing our data fusion platforms, we wanted to create a way to integrate data for secure, collaborative analysis at scale. We knew that achieving this outcome would not be a matter of simply making more data available to more analysts. We wanted to give our users a way to learn from and build on not just the raw data in their enterprise, but the insights of their colleagues, too. The right technology had to integrate both data and analysis.
As we grew and deployed our technology to more (and more diverse) users, we saw first-hand how collaborative data analysis can save time, save money, and even save lives. We also saw a lot of unrealized potential. If the advantages were so apparent, we wondered, why didn’t we observe more collaborative data analysis between organizations where policy allowed it—allied nations, neighboring public institutions, and partner commercial enterprises?
In turns out that integrating data for secure, collaborative analysis isn’t as easy as throwing everything onto a share drive. Many of our customers wanted to collaborate more, but were prevented from doing so by several hard, multi-faceted technical problems:
- How do you resolve conflicts that might arise from an act of data sharing? If duplicate or conflicting versions of the data are created, there is a risk of obsolete or inaccurate data being spread throughout the enterprise. Alternatively, if data is overwritten, valuable knowledge risks being destroyed.
- How do you collaborate in situations where analysts are geographically dispersed, and communications between them are plagued by intermittent connectivity, high latency, or low bandwidth?
- How do institutions collaborate when they have different missions and different ways of modeling their world? How does a regulatory agency collaborate with a global financial institution in a way that allows each to retain its own data model?
- How do you implement the right privacy safeguards when sharing data both within and between different organizations? How do you protect data at a granular level, so it can only be accessed by those who are authorized to do so?
- How do you enable data sharing without compromising data security when different organizations, and different data sources within organizations, are subject to different data protection and retention policies, classification levels, or access control regimes?
Nexus Peering is Palantir’s solution to these problems. Nexus Peering enables information-sharing at the institutional level, allowing teams, agencies, and governments to exchange data and analysis in almost any direction or environment while maintaining consistency, integrity, and security.
Understanding Nexus Peering
So how does Nexus Peering work? A comprehensive explanation would require more than a simple blog post. But for introductory purposes, you can think of each installation or instance of our Palantir Gotham data fusion platform as maintaining its own “nexus” of data. Nexuses can incorporate changes made by users at different Palantir Gotham instances through acts of synchronization, or “peering.” The main goal of Nexus Peering is to ensure that data is always in a consistent state across instances. To this end, Nexus Peering must capture, circulate, and merge changes to shared data while recognizing and resolving data conflicts.
Each nexus maintains a record of each change to every piece of its data in a manner similar to how revision control software such as Git or Mercurial tracks changes to codebases during software development. Upon log-in, each user “checks out” a copy of the consensus view of the data (known as the Base Realm) and works with it on a private branch. When a user wishes to share the results of his or her analysis with the rest of the team, he or she publishes the changes back to the Base Realm.
Nexus Peering keeps track of these changes across nexuses, even in cases where the peered instances are not in constant communication. In order to merge changes that were made to the data concurrently by users at different nexuses, Nexus Peering uses a technique called ‘version vectors’ to keep track of and automatically apply the changes in the proper order when peering. In cases when concurrent changes conflict, users are alerted and forced to determine for themselves which version of the data they want to work with.
That’s a brief overview of how Nexus Peering works. For an in-depth explanation, check out the presentation from GovCon 6 below.
Nexus Peering in Palantir Gotham 3.8 And Beyond
Late in 2012, we upgraded the instances on our largest Nexus Peering mesh network (spanning 40 locations and four continents) to Palantir Gotham 3.8, which includes a host of new Nexus Peering features. Here are some highlights:
- User Attribution: Peered objects now include a rich attribution of each change made prior to peering, indicating which user on the originating instance made the change (and at what time).
- Incremental Peering at Scale: Transferring data “chunks” via Nexus Peering was previously an all-or-nothing process. In 3.8, we have made peering incremental, which is a vital improvement in bandwidth-constrained and massive data scale environments.
- Graph Peering: Users can now publish analysis in the form of finished graphs for peering to other systems.
- Cross-Domain Peering: Where appropriate, Nexus Peering can now be performed across different networks. Customers can move data from lower to higher classification levels through existing one-way data guards, even if it requires re-writing all relevant data in human-readable form. Where allowed, a Palantir customer with classified data can now peer all data at a lower classification on an ongoing basis, whether this data is their own or a partner agency’s.
- Cross-Classification Peering: Systems with different classification levels (for example, friendly agencies of different allied nations) can now peer with each other. Nexus Peering allows administrators to define classification translations that safely map one set of classifications onto another.
Today, Nexus Peering works across data models, security models, time zones, and borders. It does not require all data to be shared and it does not require an active connection between the two nexuses being peered. Nexus Peering captures and circulates changes between nexuses, while recognizing and resolving conflicting changes, all without destroying or duplicating any data. With Nexus Peering, institutions can create a unified, shared, enterprise-wide information picture that is updated at whatever frequency is possible. Sometimes that’s as real-time as a high-speed network will allow, sometimes it’s as fast as the physical drive containing the latest updates can be hand-delivered.
This is what a complex Nexus Peering network might look like. Each box represents a Palantir Gotham instance, or “nexus.” (Place names chosen at random from a Diplomacy board.)
Based on the technical improvements introduced in the latest version, we envision even more compelling use cases for Nexus Peering. Cooperation between different organizations, including international partners, is now significantly easier, enabling unprecedented secure collaboration on common threats such as cyber attacks. The same security frameworks that allow governments to share data can now also allow secure sharing between the private sector and government, while protecting individual privacy and organizational sensitivities. New use cases could include banks sharing cyber threat data with law enforcement, pharmaceutical companies sharing product safety data with regulatory agencies, peering between agencies involved in disease control, and peering between our philanthropic partners. We’re looking forward to seeing what the future of Nexus Peering holds.