Advent of Functional PHP: Day 9

Day 9 of this year's Advent of Code revolves around grid interpretation. Specifically, we are given a grid of numbers and want to find the low points, that is, the numbers that are smaller than any of their orthogonal neighbors. (We're told to ignore diagonals in part 1.)

After finding the low points, we need to do a bit of math on each one, and add them up. As usual, this last step is mostly just to produce a single verification number at the end. That part is easy as usual, but how do we find the low points?

Risky business

In this case, brute force really is the fastest option. For each cell, find its neighbors and then see if it's smaller than they are. We'll start by reading in out input data to a 2D grid of numbers.

pipe($inputFile,
    lines(...),
    amap(compose(str_split(...), amap(intval(...)))),

That's our usual line-reading routine, but then we split each line string into an array of characters, and convert all of those characters into integers. Again, we're using compose() inside another operation to produce a function that will apply to each line, then map it across all the lines. Simple enough.

We need to retain the grid positions of each cell long enough to figure out what its neighbors are... but we don't need to care after that. So what we're looking for is a list of values and the list of their neighbors. That suggests a function that takes in the grid and produces a list of value/neighbor pairs. They're not really tuples, though, so let's wrap it up in an object for readability.

class Position
{
    public function __construct(
        public readonly int $value,
        public readonly array $neighborValues,
    ) {}
}

So now we need a function that goes from a grid of numbers to a list of Positions. There is likely some slightly more functional way to do this, but... I found it easier to fall back to foreach() loops here. There are ways we could have rebuilt it into a reduction, but since PHP doesn't support tuples as array keys (and alternatives like SplObjectStorage only work with objects as keys, not arrays), that would have been an awful lot of unnecessary work. This is PHP, so sometimes the Pure Twue Functional(tm) approach is just not worth it at the micro-level (even though at the macro level this program is still entirely pure).

function buildNeighborMap(array $heights): iterable
{
    foreach ($heights as $r => $row) {
        foreach ($row as $c => $val) {
            $neighbors = array_filter([
                $heights[$r-1][$c] ?? null,
                $heights[$r][$c-1] ?? null,
                $heights[$r+1][$c] ?? null,
                $heights[$r][$c+1] ?? null,
            ], static fn ($v): bool => !is_null($v));
            yield new Position($val, $neighbors);
        }
    }
}

We double-iterate over the grid, and for each cell we yield a Position with the neighbor information. Building the neighbor information is interesting. Because there are edges to deal with, some cells won't have all edges. The typical approach would be to wrap an isset() check around each one, but I find defaulting them to null and filtering to be visually cleaner. (Note that since 0 is a valid value, we have to do a strict is_null() check rather than relying on the default truthiness check.) Regardless of your preference, the result is a lazy stream of Position objects, which contains all the information we need. The rest of part 1 is fairly straightforward.

We only care about those Positions that are a low point, that is, their value is smaller than all their neighbors. (The puzzle helpfully avoids neighbors ever being the same value.) That sounds like a filter operation to me, with a simple enough filter function:

function isLowPoint(Position $p): bool
{
    return $p->value < min($p->neighborValues);
}

In other words, a value is a low point if it is smaller than the smallest of its neighbors. So now we have a list of just those Positions that are low points. The remaining task is to compute the "risk factor" of each one, which means adding 1 to it, and then add up all the risk factors. That's simple enough. Map over them with this function:

function computeRisk(Position $p): int
{
    return $p->value + 1;
}

And then array_sum() the result. Our final pipeline looks like this:

$riskSum = pipe($inputFile,
    lines(...),
    amap(compose(str_split(...), amap(intval(...)))),
    buildNeighborMap(...),
    afilter(isLowPoint(...)),
    amap(computeRisk(...)),
    array_sum(...),
);

And that's part 1 done.

What are you basin that on?

The next part is trickier. Now that we know how to find the low points, we need to compute the "basins" of the map. That is, from each low point we need to walk out to the high points, which are always 9. That "valley" is what we're interested in, and specifically how big each one is (that is, how many cells are in it). And then we want to multiply together the size of the 3 largest basins (just to prove we got it right). Oy.

This one takes a bit more work. OK, a lot more work. Just from the problem, it definitely seems like a recursive algorithm will be needed: growing the size of a basin recursively until we get to the high-point edge conditions. I went through a few iterations myself on this problem (no pun intended) before someone suggested I look up how the creator of Elixir, José Valim, did it. He's been posting his solutions online with videos, which I found very helpful to wrap my brain around a specific approach. I had to modify it a bit to fit into PHP, but the end result is very similar to his.

The limits of PHP

Ironically, much of the machinery we built for part 1 is either not going to apply or will be different. The algorithm I ended up with didn't use a Position object, and despite my attempts to squeeze in a more robust data modeling I ended up with the easiest solution being... named tuples, aka associative arrays. I feel dirty even saying that, but that's what I was able to get to work. (It could also have been done with unnamed tuples, aka sequential arrays, but I wasn't willing to go that undocumented.)

The main challenge is that we need to do an in_array() check, which doesn't work on objects, currently, because objects have no meaningful == comparison. (There is an RFC to fix that in the future that I very much support, for cases exactly like this one.) In the end, falling back to associative arrays was less work than trying to contort objects into an in_array() type comparison. That could no doubt have been done in some fashion or another, but it would have been more work than I cared to do and potentially slower as it's all in user-space.

Starting points

We're going to start off by collapsing our "find low points" algorithm into a single function step, to avoid unnecessary intermediaries.

function findLowPoints(array $heights): iterable
{
    foreach ($heights as $r => $row) {
        foreach ($row as $c => $val) {
            $neighbors = array_filter([
                $heights[$r-1][$c] ?? null,
                $heights[$r][$c-1] ?? null,
                $heights[$r+1][$c] ?? null,
                $heights[$r][$c+1] ?? null,
            ], static fn ($v): bool => !is_null($v));
            if ($val < min($neighbors)) {
                yield ['x' => $r, 'y' => $c];
            }
        }
    }
}

That's the same logic as we had before, just collapsed into a single function. The Position object is unnecessary for the second part, and in this case we do need the coordinates rather than just the neighbor list. It's just different enough to make the previous parts not directly reusable. Note again, though, that we're yielding each result for simplicity.

Growing the answer

Next, we need to map each low point coordinate into its "basin." A basin is a list of points, inclusive of the low point, that form a valley in our data. As noted above, we're going to find those recursively, and this is the algorithm I adapted from José, the Elixir guy. It's hard to explain without seeing it, so let's start by just showing it.

function basin(array $point, array $grid): array
{
    return computeBasin([], $point, $grid);
}

function computeBasin(array $basin, array $point, array $grid): array
{
    if (in_array($grid[$point['x']][$point['y']] ?? null, [9, null], true)) {
        return $basin;
    }
    if (in_array($point, $basin, true)) {
        return $basin;
    }
    return pipe($basin,
        append($point),
        fn(array $b): array => computeBasin($b, ['x' => $point['x'] - 1, 'y' => $point['y']], $grid),
        fn(array $b): array => computeBasin($b, ['x' => $point['x'] + 1, 'y' => $point['y']], $grid),
        fn(array $b): array => computeBasin($b, ['x' => $point['x'], 'y' => $point['y'] - 1], $grid),
        fn(array $b): array => computeBasin($b, ['x' => $point['x'], 'y' => $point['y'] + 1], $grid),
    );
}

Like many recursive algorithms, it's actually really compact and simple code once you understand what it's doing. Also like many recursive algorithms, it comes in two parts: A recursive bit and a "starter" function to provide initial defaults. We need access to the grid, so that gets passed through. The starter part, basin(), accepts a point (an x/y pair in an associative array), and kicks off the recursive part with an empty basin list.

The meat of the algorithm is in computeBasin(), which builds up the list of points in a basin. It's given a list of points already determined to be in a basin, and a point that may or may not be, plus the grid for context. We want it to stop if the point is not in the basin, or add it to the list and then check its neighbors if a point is in the basin.

The first thing it checks is if the grid point being checked actually exists. If it doesn't exist, or if it is 9 (meaning a high point), then it just returns the current list of basin points unchanged.

Second, it checks to see if the point being checked is already in our basin list. If so, just return the basin unchanged. (If we didn't have that check, we'd just circle around re-adding the same points to the basin forever.)

Third, we now know that the point being checked:

Actually exists.
Is not a high point, so the basin isn't over.
Isn't already accounted for.
Is a neighbor of a value in a basin, or else it wouldn't have been called to begin with.

Point 4 is somewhat implicit, but that sort of implicit knowledge is common in recursive algorithms. Given that we know that the point must be in our set. Therefore, we can add it to our set, and then check all of the orthogonally neighboring cells the same way, recursively. Then we want to return the modified basin with all of the added points.

We can do all of that very compactly by passing the $basin list to computeBasin() with each of the neighboring points. Those computeBasin() calls will return the modified basin, which we can then pass to the next point, and so on. All of that compacts down very nicely to the pipe() call above. The first step is to append() the current point to the basin. append() is a simple utility that makes adding to an array pipe-able. Crell/fp's version looks like this:

function append(mixed $value, mixed $key = null): callable
{
    return static function (array $it) use ($value, $key): array {
        if ($key) {
            $it[$key] = $value;
        } else {
            $it[] = $value;
        }
        return $it;
    };
}

Since we're not passing a key, it goes to the else branch and returns the array with the one element added. That result then gets passed to computeBasin() for the cell to the left, whose result is passed to computeBasin() for the cell to the right, and so on.

Aside: A cute trick

As an aside, technically, each conditional branch in the function is now a single expression. That means, if we really wanted to, it could be refactored to one single match() statement.

function computeBasin(array $basin, array $point, array $grid): array
{
    return match(true) {
        in_array($grid[$point['x']][$point['y']] ?? null, [9, null], true) => $basin,
        in_array($point, $basin, true) => $basin,
        default => pipe($basin,
            append($point),
            fn (array $b): array => computeBasin($b, ['x' => $point['x'] - 1, 'y' => $point['y']], $grid),
            fn (array $b): array => computeBasin($b, ['x' => $point['x'] + 1, 'y' => $point['y']], $grid),
            fn (array $b): array => computeBasin($b, ['x' => $point['x'], 'y' => $point['y'] - 1], $grid),
            fn (array $b): array => computeBasin($b, ['x' => $point['x'], 'y' => $point['y'] + 1], $grid),
        ),
    };
}

In this case, I would not recommend doing so. It's unnecessary, and I don't think it actually makes the code any easier to read. However, recursive algorithms very often (though absolutely not always) end up with this kind of pattern, where there's one or two "bail out" single-line cases and a single-line recursive call. If you find yourself with that kind of pattern, it's a good sign. Even if you don't feel like collapsing it into a single expression like this, the fact that you could is a "good code smell" when doing functional programming and recursion specifically.

It's the size the matters

That's the tricky bits. The rest are fairly mechanical. We're instructed to find the three largest basins, get their sizes, and multiply those together. Finding the 3 largest means we have to sort the list, so let's start with that function:

function sortBySize(array $basins): array
{
    usort($basins, static fn ($a, $b): int => -1 * (count($a) <=> count($b)));
    return $basins;
}

Once again, PHP's lack of a pure sorting utility means we have to wrap our own around it. Blech. And because we want the biggest first, we multiply the spaceship operator's result by -1.

Next, we want just the top 3. An array_slice() call would do that, but that's a bit harder to pipe and doesn't handle iterables. Fortunately, there is a common functional tool for that called take, that "takes" (returns) the first X values from a list, somewhat like an SQL LIMIT clause. PHP doesn't have it natively, so of course we'll make our own.

function atake(int $count): callable
{
    return static function (iterable $a) use ($count): array {
        if (is_array($a)) {
            return array_slice($a, 0, $count);
        }
        $ret = [];
        foreach ($a as $k => $v) {
            if (--$count < 0) {
                break;
            }
            $ret[$k] = $v;
        }
        return $ret;
    };
}

As with all of our other utilities, we return a new function that takes a single iterable as an argument so that it's pipe()-friendly. In the base case of an array, we can fall back to array_slice() for performance. If not, we need to iterate the iterator a limited number of times and then return the result.

(Fun challenge: What if we wanted an ittake() method that returns values lazily, say because we want a lot of them? What would that look like? Did you know it can be implemented in a single statement? See if you can figure out how before checking Crell/fp for the solution.)

That gives us the 3 largest basins, which are themselves arrays of points. Next we want the size of each, so that means piping through count(), and then we want to multiply all of those together. PHP gives us an array_sum() function, but not one for multiplication. Fortunately that's simple enough to write:

function array_multiply(array $values): int|float
{
    $ret = 1;
    foreach ($values as $v) {
        $ret *= $v;
    }
    return $ret;
}

Basins, assemble!

And finally, we can pipe everything together.

$grid = pipe($inputFile,
    lines(...),
    amap(compose(
        str_split(...),
        amap(intval(...)),
    ))
);

$basinProduct = pipe($grid,
    findLowPoints(...),
    amap(fn(array $point) => basin($point, $grid)),
    sortBySize(...),
    atake(3),
    amap(count(...)),
    array_multiply(...),
);

The $grid needs to get used twice; once to feed into findLowPoints(), once to close over in the call to basin(). That's why it is its own pipeline. Otherwise, everything else should be fairly self-evident by now.

As always, the final code is available on GitHub.

Want to know more about functional programming and PHP? Read the whole book on the topic: Thinking Functionally in PHP.