“You’re asking us to test our platform’s programming language? How am I supposed to do that?”
My head itches from trying to recall the bits and pieces of what I learned in high school about programming, specifically the semantics of a programming language. Sure, I did a bit of programming for homework assignments in college, but I was no CS major. This was a much different challenge for a QA engineer to test. Compared to an application, a programming language is completely open ended; there are no specifications to test, guidelines to follow, or limits to break.
The Hedgehog language had the basic set of tools laid out for me already: I could declare variables, create data structures, and use loops for iteration. As I was trying out individual usage examples, such as how to structure
if statements or how to cast an object to a different type, I realized that this was no way to test something as powerful and flexible as an entire language. It would be like a doctor who claims that since each individual organ works fine, there are no problems with the entire system. This is insufficient: one needs to look at the system as a whole, including examining the interactions between each component. I decided I needed to create much larger and elaborate code samples in order to test the Hedgehog language in a larger scope.
Using the Hedgehog language, I had programmed several algorithms, solving puzzles that would output a number. This was getting bit boring, since once the output value was matched the expected number, there was nothing more to be done. I wanted to create something more dynamic, a toy I could play around and experiment with. And opportunity presented itself in the form of one of our newest tools: the spreadsheet application. With the capability of setting the value of each cell programmatically and then coloring them depending on their value… hmm what could I do with this?
Hedgehog is a powerful tool in coding functions and workflows that directly interact with our applications. Most of the time, the language is used to write expressions for an input value, create custom metrics that return values after a set of calculations, or even to set inputs, calculate, and save documents. Given the language’s ability to integrate with Spreadsheet, the capabilities of the Hedgehog language can literally be visually shown to the user, resulting in some stunning displays.
Below are three examples I’ve coded in Palantir Finance’s own language: calculating and drawing the Mandelbrot fractal, simulating Conway’s Game of Life, and solving a Nonograms puzzle.
The Mandelbrot set is one of the most recognized images among fractals. Fractals are unique images due to their self-similarity, such that they have infinite detail: as you zoom in on a fractal border, detailed structures will continue to appear indefinitely.
Since the Mandelbrot set is an example of an escape time fractal, we can write a function that counts the number of iterations it takes for a given point to escape a threshold value. Applying this function to each point within a grid of specified range and resolution, we can generate a field of values to be set in a spreadsheet. After resizing the spreadsheet cells into small squares and applying conditional formatting, specifically a heat map that colors the cell based on its value, the below images can be generated:
Fig M1. Mandelbrot Fractal drawn from (-1.5, -1) to (0.5, 1), with the top left quadrant visible
Fig M2. Mandelbrot Fractal drawn from (0.27205, 0.00451) to (0.27505, 0.00751)
Fig M3. Mandelbrot Fractal with manually created rainbow heat map
John Conway’s Game of Life is a simple cellular automaton that shows the evolution of an initial state of cells governed by a set of rules. In the classic scenario, cells can be alive or dead, indicated by a filled or empty cell respectively. When evaluating the next generation of cells, each cell checks the number of live neighbors among its adjacent 8 cells. If the cell is alive with 2 or 3 live neighbors, it remains alive. If a dead cell has 3 live neighbors, it becomes alive. All other scenarios, the cell evolves into a dead cell.
In this example, we rely heavily on spreadsheet’s cell dependency tree. Cells are able to reference the value of other cells within their expression, creating a directed acyclic graph. If the value of a cell changes, all cells that point towards the changed cell need to recalculate their value. In this specific case, the user can change the value of which generation to view, which causes the recalculation of the grid of cells representing the game.
In the spreadsheet below, the initial state, or generation 0, of “Noah’s Ark” is shown. The initial data for Noah’s Ark is derived from an external spreadsheet document containing a grid of 0’s and 1’s.
Initial generation for Noah’s Ark
An animated image of the evolution of Noah’s Ark from generation 0 to generation 10
One problem encountered is that it takes increasingly longer to calculate a larger generation number. Because the only generation that is saved is the initial generation, calculations must always start from generation 0 when the user changes the generation number. This triggers a lot of redundant calculations: if I am at generation 10 and want to see generation 11, the calculation starts from generation 0, repeats all the same calculations up to generation 10, and finally calculates one more iteration to reach generation 11. To prevent unnecessary calculations, the results of previous calculations can be written to a cache file: a separate spreadsheet document containing the generation data that was calculated previously. When caching is enabled, previous generation calculations can be retrieved so the same calculation never happens twice.
Below is an animated image of a “Gospel Glider Gun” from generation 0 to generation 40, shown in increments of 2 generations. The glider gun will continue to oscillate over time while producing a 5 cell glider object that floats away from the gun diagonally down to the right. Without the caching functionality, the 40th generation would take approximately 4 times as long as the 10th generation. By enabling caching, each animation frame takes approximately the same time to calculate since only 2 generations are computed per frame.
Nonograms, sometimes better known as Picross, are a type of logic puzzle where the user is given a blank grid with a set of numbers for each row and column. The numbers represent the number of consecutively filled blocks in the final solution within that row or column. Using these numbers as clues, it can be deduced whether or not a space is filled or not, until the entire grid simultaneously satisfies all given conditions. Solving puzzles is a NP-complete problem: it is very easy to verify a solution by checking if each row and column satisfies its set of numbers, but becomes increasingly difficult to solve larger puzzles. Time it takes to check a puzzle increases linearly with puzzle size, but the time to solve increases much faster due to the increase in number of iterations over each row or column crossed with an increased number of permutation possibilities for each set of numbers per iteration. Solving a 25x25 nonogram puzzle by hand usually takes about 30 minutes for an experienced solver.
Given the initial set of values for each row and column, the algorithm iterates through each row or column individually. The simplest method to solve is to find all possible permutations of the given numbers over the length of the row or column and check if there are any spaces that are consistently filled or unfilled across all possibilities. Let’s consider a simple example: In a row of 5 spaces, the set of values given is [2,1]. Thus the possible permutations are shown on the right, where 1 is a filled space, -1 is an unfilled space, and 0 is undetermined.
Additional optimizations, such as queuing the order of rows and columns to solve, trimming the solved ends of a range of spaces, and avoiding unnecessary calculations, were made to improve the performance of the algorithm such that the average time to solve a 25x25 puzzle is 5 minutes. This is fairly impressive given that the Hedgehog language is by no means optimized to handle such calculations and array manipulations. Below are some examples of solved puzzles:
Fig N1. 10x10 nonogram solution, solved in less than a second
Fig 2. 20x20 nonograms solution, solved in 20 seconds
Fig N3. 25x25 nonograms solution, solved in 3 minutes
The initial spark that ignited this series of code examples started from just a simple curiosity of, “I wonder if it would be possible to write this in Hedgehog?” In a way, it was also an exercise for me to learn more about the Hedgehog language, extending the use of its capabilities and libraries of metrics, in order to inject more complicated expressions while testing other applications on our platform. Throughout the process of developing the finished product, I was able to expose issues with the language that may have gone unnoticed with smaller, simpler test cases.
But of course, the driving motivation behind all this was to have fun, which is a big part of what it means to be working at Palantir.