Blogs

Inside Horizon: interactive analysis at cloud scale

Late last year, we were honored to be invited to talk at Reflections|Projections, ACM@UIUC’s annual student-run computing conference. We decided to bring a talk about Horizon, our system for doing aggregate analysis and filtering across very large amounts of data. The video of the talk was posted a few weeks back on the conference website. [...]

Continue reading »

Palantir: search with a twist (part two: realtime indexing and security)

[A number of weeks ago, we published a post on the search technology used by Palantir. That post covered raising the memory efficiency of a couple of operations. This is part two of that series.] The most familiar use of search engines is to index documents made available on the Internet via the hypertext transfer [...]

Continue reading »

Palantir: search with a twist (part one: memory efficiency)

A Palantir cluster seamlessly integrates many pieces of proven technology. One of them is our customized version of the venerable Java search engine, Lucene. Search engine technology tends to be optimized for the common use case of indexing web documents (or similar information architectures) where you have a few search terms in each query and [...]

Continue reading »

Bandwidth isn’t cheap. Disk isn’t cheap. CPU isn’t cheap.

At Palantir, we work in Silicon Valley, read High Scalability, and think of web companies like Facebook and Google as our peers. Most of the time, this is exactly the right recipe for bringing disruptive innovation into the intelligence community. Sometimes, though, it’s misleading – when discussing a design decision, it’s received knowledge that “Disk [...]

Continue reading »

Oracle’s JDBC driver + prefetch == garbage [collection]

The Problem Recently, we were experiencing major performance problems with loading documents from the database. Profiling did not isolate a single cause; everything (including unrelated, background operations) seemed slow. So, we started logging garbage collection, and found that we were collecting garbage at a rate of 20GB/min! Profiling revealed that the worst offender, by far, [...]

Continue reading »