Product Reliability: How to Ace a Diagnosis interview

How to Ace a Product Reliability Engineer Diagnosis Interview

What is a Diagnosis interview?

Diagnosis is decomposition with a Product Reliability spin.

We want to see if you can take a vaguely defined problem, ask questions to build an understanding of what is and isn't known, form hypotheses about the root cause of the issue, and design actionable experiments to confirm or rule out each hypothesis.

Why? Our Diagnosis interviews are designed to challenge you in the same way Product Reliability Engineers are challenged every day at Palantir. We've tweaked real product issues we've solved over the years so that we can see how you would tackle the same problem if you were on our team. The core of our work on Product Reliability is unblocking product-related issues. Forward Deployed Software Engineers and Palantir customers come to us with nebulous, complex problems, and it's on us not to let them down. Sometimes you have a few weeks to fix them, but other times it's an emergency that must be fixed in a matter of hours.

As a Product Reliability Engineer you have absolute ownership of the tickets assigned to you and the freedom to resolve each issue using your skills and the resources at hand. Every Product Reliability Engineer has their own unique style, but what's common to all of us is creativity, product expertise, collaboration, and grit. Because of this freedom, we need people we can trust to do the right thing without a lot of supervision. That means people who know which questions to ask, where to look for clues, and how to synthesize information to narrow down the problem space. We're looking for people who can make progress on their own, but who also know when to loop in others for expertise, advice, or an assist on a critical issue. Since we often can't have direct access to the systems we're troubleshooting, it's imperative that we collaborate well with other teams, so strong communication and listening skills are critical.

Interviews

Our interviews are modeled as role plays - the interviewer takes the role of a Forward Deployed Software Engineer or customer with a problem, coming to you as a last resort for help and guidance on how to get unblocked. The interview starts with a description of a system and a problem we're having. For the next 45 minutes, you'll play the role of a Product Reliability Engineer, digest whatever problem we throw your way, and work with us to identify the root cause and propose a solution.

Expect your interviewer to challenge you on your assumptions, correct you, or offer suggestions. We're trying to create an environment that is similar to what we actually experience, where we work collaboratively with Forward Deployed Software Engineers and customers — especially if they have access to information that we don't. We don't expect you to have specific knowledge about the problems we'll throw your way; all the information you need will be provided by your interviewer. It's up to you to use what you learn to come up with the ideas, questions, and tests you'll need to make progress.

Some tips:

  • Start by creating a hypothesis, then test it.

  • Talk us through each step in your thinking and be sure to highlight what you know and — crucially — what you don't know. We aren't necessarily interested in whether or not it's the right idea; rather, we're looking to see how you're thinking about solving the problem.

  • It's a role play, so anything goes. If we throw something new at you, roll with it. If you get stuck, consider what you'd do if you were facing a similar problem at home, or with a friend, colleague, or classmate. Communication, collaboration, and creativity are crucial to good diagnosis.

Try this at home

Diagnosis is troubleshooting, and troubleshooting is something you can easily expose yourself to in preparation for an interview. Some things to try:

  • Help a friend or family member fix something while talking them through your thought process. Fixing tech (a laptop, network, software, or hardware) will be most beneficial, but fixing mechanical devices can be equally good practice. Try to explain your thinking to whomever you're helping, and engage them to help you — two brains are more powerful than one. Better still, guide them to solve the problem on their own.

  • Help others with tech-related problems online.

  • Play 20 questions with friends or family.

  • Code. When you code, you'll likely spend plenty of time troubleshooting your own work. This is great practice for working on Product Reliability at Palantir.

Some final pointers

  • Don't give up! Sometimes the answer isn't obvious, but that doesn't change that fact that others are relying on us to find it. We are interested to see how you respond when the challenge gets really hard - show us your grit!

  • Be efficient. You're time constrained, so be methodical and intentional in your work. It's important that we respect our customers' time, so asking questions without a sound hypothesis (i.e., guessing) delivers a poor experience for them. In the interview, we'll be looking for how you work under pressure and with incomplete information.

  • Be aware of your assumptions. You will be presented with a problem that a Forward Deployed Software Engineer or customer is having, but the root cause of that problem could be anything. Avoid making unchecked assumptions about what the root cause is to ensure you don't lose time focusing on the wrong things.

  • Be precise. Can you trust an answer to a question with multiple interpretations? In order to have confidence in the significance of the answers you're getting, ask specific questions. As a Product Reliability Engineer you'll be working with customers who are in high-pressure situations and might not give you all the information you need. Being specific when asking for questions or actions is one way to avoid ambiguities and incorrect assumptions.