Site Reliability Engineer

Palantir Site Reliability Engineers monitor and maintain our systems to pre-empt problems before they ever threaten our customers’ workflows.

View Openings for this Role

A World-Changing Company

At Palantir, we develop the world's leading products for data analysis and we deploy them against problems that truly matter—uncovering human trafficking rings, containing the spread of infectious diseases, combating fraud, stopping cyber attacks, protecting privacy and civil liberties, prosecuting complex financial crimes, providing relief to victims of natural disasters, and more.

The role

Palantir software is deployed at the world’s most critical institutions to help them solve their greatest challenges. Users at customer sites from Washington, DC to Tokyo rely on Palantir’s high availability to pursue their missions. Site Reliability Engineers (SREs) make sure our expanding number of customer deployments run smoothly 24 hours a day.

SREs combine engineering experience and an innate drive to improve existing systems and processes with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes whenever possible, using whatever tools are best for the job. Our responsibilities range from administering collocated servers (including hardware troubleshooting) to maintaining database platforms.

We work with a variety of teams to understand threats to our software and improve our products over time. We work side by side with Palantir’s implementation teams and customer IT departments to understand our customers’ unique problems and develop innovative solutions. We document our successes and communicate them back to Palantir’s product teams to advance the way our hardware, software, and network solutions are deployed to minimize failure rates and increase overall system reliability.

REQUIREMENTS

  • 5+ years of experience with Linux system administration (RHEL or CentOS preferred)
  • Experience with monitoring systems using tools like Nagios and writing health checks
  • Moderate experience with TCP/IP networking
  • Interest in learning and managing newer technologies like Spark, Hadoop, Elasticsearch, Node.js, and RabbitMQ
  • Ability to work independently with minimal supervision
  • Ability to participate in a 24/7 on-call rotation

PREFERRED

  • BS/MS in Computer Science
  • Experience with virtualization using AWS, VMWare ESX, KVM, Xen, or Docker
  • Experience with system management tools like Puppet or Chef
  • Ability to travel to customer sites up to 25% of time

Resources

Engineering

We are builders, innovators, and problem solvers. We live at the intersection of efficiency and ingenuity and create software that is industry-defining and sometimes even life-saving.

Engineering Culture

Engineers build things that solve problems, but at Palantir you don't have to be a computer scientist to be an engineer. You do have to speak up when things aren't right and build things that fix what's broken.

Life at Palantir

Perks, benefits, social activities, and learning opportunities: people are our most important asset, so we invest in our people every day.

Getting Hired

If you want to stare into the face of important problems and have the freedom to solve them, we want to work with you. We have some resources to help you navigate the hiring process.

Site Reliability Engineer Openings

No matter which office you are based at, you will be part of a group of people working together to build solutions to mission-critical problems and a company that values the very best ideas.