Analyzing the Efficiency of Code


The ability to analyze and improve the efficiency of code so that it runs with optimal complexity, creating systems that store and transmit data efficiently.

Why is it important at Palantir?

Efficiency and performance are critical for our software. Our back-end systems need to organize data efficiently and enable efficient queries. Different types of services access our back-end systems by running algorithms that extract insights from large amounts of data. Our front-end systems provide different types of visualizations in real time. It's imperative that both code and data are as efficient as they can be to create a responsive experience for users.

Palantir engineers need to determine which components in a large system are critical for efficiency, and where can we make trade-offs. When should we replicate data in multiple storage formats that each optimize for a different type of query versus optimizing for space? How can we take advantage of servers with very high memory by keeping a part of the data in memory for fast access: what data do we keep, how will it be synchronized, and how can it be accessed most efficiently? How can we detect and optimize bottlenecks: when is processor time optimization critical, and when is performance I/O bound?

How to prepare

Master the fundamentals, but remember real-world constraints. Some advice:

  • Get your O notation in order. Know how to properly calculate the efficiency of algorithms and understand how performance differs between different values for the O function. Practice it with existing algorithms and learning materials.
  • While time complexity is usually most important, never forget about space complexity and how one can affect the other. Try to understand when introducing space complexity can be useful in optimizing time, and when it's just an unnecessary overhead.
  • Keep an eye on the real-world applications of algorithms. What is the practical impact of an optimization that doesn't change the order of complexity? What are the actual performance trade-offs in a system with multiple components? What are the realistic best, worst, and average scenarios, and how do they compare to the theoretical ones?