Advent of Functional PHP: Day 3

in PHP2 years ago (edited)

The third challenge in this year's Advent of Code is all about bit manipulation. We're asked to read in a series of binary numbers and interpret them in various entirely illogical ways as a form of diagnostics. (Incidentally, if you ever write a system that requires this kind of logic to debug its output, you're fired.)

In any case, we're given a file with a list of 12 digit binary numbers and asked to compute various values. In the first part, we are asked to find the most common bit (0 or 1) in each position, and the result is known as "gamma." Then we have to find the least common bit in each position, and the result is known as "epsilon." (I don't know why you would want to do this; it's all Greek to me.)

In practice, the functional part of the solution isn't all that challenging. It was the bit-wise manipulation that was tricky, as I rarely do bitwise operations. But we do get to show off a few new functional tracks, so let's dig in.

Tracking data

To start off, we need to read in our data. It's simple lines, so we'll use the same lines() function as yesterday. (Hooray for code reuse!). That gets us a list of strings. However, we need the data as binary numbers. Fortunately, PHP has a built-in bindec() function that takes a string and returns the corresponding integer. So that's easy enough.

Next, what do we want to do with the data? Well, we want to iterate over it and compute some totals. That... sounds like a reduce operation to me. The totals here are more complex than a single value, so let's go ahead and define a BitCount class to serve as our running counter. We'll make it all immutable and spawn a new count value after each line; it could be done with a single immutable value, but we're trying to be functional!

class BitCount
{
    public function __construct(
        public readonly int $entryCount = 0,
        public readonly array $counts = [],
    ) {}
}

$entryCount tracks the number of total entries in the file. It gets incremented each time. $counts is a 12-element array counting the number of 1s in the corresponding position. Since there's only 1s or 0s to consider, counting the 1s is sufficient for both computations.

Now we need our reducing operation, which will take a BitCount instance and an integer, then return a new BitCount instance with the appropriate counts increased. That means doing bit-math.

The mask of zero

There are a number of different possible ways to check if the value of a given bit is 1 or 0. The way I chose to do it here is to bitwise-& against a mask. We need 12 masks, one for each bit position. Each mask is a power of 2, so that should be easy enough to generate. However! The instructions say to count positions from the left, not from the right. That means to check the left-most bit, we need to & the number against the mask 0b10000000000. That leads to this exciting little mess of a function:

function masks(int $size): array
{
     return pipe(range(0, $size - 1),
        amap(static fn (int $i) => 2 ** $i),
        array_reverse(...),
    );
}

We start by getting a list of powers to raise to, from 0 (which always returns 1) to one less than the size we want, to account for the 0-base. For each of those numbers, we raise 2 to that power. That gets us a list 1, 2, 4, 8, 16, etc. Finally, flip it around so that the largest number (left most bit) comes first.

We're going to call this function a few times, so for performance we should either call it once and just reuse the value, or memoize it. "Memoize" is just the fancy computer-science-y name for "cache the result." Normally it's better to memoize a function externally, but for expedience and so that we can still call it globally, we'll internalize the cache. Because the body is a single expression, though, that makes the cache super duper easy to implement:

function masks(int $size): array
{
    static $cache;
    return $cache[$size] ??= pipe(range(0, $size - 1),
        amap(static fn (int $i) => 2 ** $i),
        array_reverse(...),
    );
}

The ??= means "use the variable on the left if it's defined and not-null, otherwise evaluate the right side, save the value to the variable on the left, and return it as well." That's exactly what a cache does, so it works beautifully here.

Counting bits

Now that we have our masks, the reducing function is straightforward:

function countBits(BitCount $bitCount, int $next): BitCount
{
    $c = static fn (int $count, int $mask) => $next & $mask ? $count + 1 : $count;

    $state = array_map($c, $bitCount->counts, masks(12));
    return new BitCount($bitCount->entryCount + 1, $state);
}

This function leverages a quirk of PHP's built in array_map(), which (unlike a proper map) can accept any number of arrays and will pass pairs of them to the callback. (In most other languages we would need to "zip" the arrays together and pass the pairs to the callback, which would unpack them again.) In this case, we're mapping over the existing $counts list in the tracking object and the mask list, both of which are 12 items long. That pair is passed to the anonymous callback, $c. $c bit-wise ANDs the mask with the number; if it comes back as 0, there was no 1 in that position so it returns the count unchanged. If it comes back as a non-zero number, PHP evaluates that to true and returns the count plus 1. That produces a new array of updated counts, which we wrap into a new tracking object along with the incremented total count and return it.

Wrap it up

Now that we have all of our pieces, stitching them together is just what we've done before:

$counts = pipe($inputFile,
    lines(...),
    itmap(bindec(...)),
    reduce(new BitCount(counts: array_fill(0, 12, 0)), countBits(...)),
);

We read the data into lines, map it over to digits, then reduce that list using the counter we just wrote. Easy peasy.

Greek life

Now that we have the sums, we need to interpret them. For gamma, we want a 1 if there were more 1s than 0s in a column, and a 0 if there were more 0s than 1s in that column. We know how many total numbers there are, and we know how many 1s are in each position, so we can compute if each number is at least half the size, like so: $count * 2 >= $bitCount->entryCount.

Now here's a fun trick. That expression returns a boolean, which PHP will coerce to 1 or 0 if used in a numeric context. If we multiply that value by the mask for that position (which, again, is a 1 in a given position and 0 everywhere else), we either get back 0 or the mask itself. If we then add up all of the masks that were returned (as well as all the no-op 0s), we get the number we want. That gives us a computeGama() function that looks as simple as this:

function computeGamma(BitCount $bitCount): int
{
    $c = static fn (int $count, int $mask) => ($count * 2 >= $bitCount->entryCount) * $mask;

    return array_sum(array_map($c, $bitCount->counts, masks(12)));
}

That does exactly what we just described. We again pair-map over the counts and mask list (which is cached so we don't have to recompute it), and produce an array consisting of either a mask value or 0. array_sum() adds those all up to get our answer.

Computing epsilon is almost the same. Here, however, we want a 1 in each position if fewer than half of the inputs had 1 in that position. It's only very slightly different:

function computeEpsilon(BitCount $bitCount): int
{
    $c = static fn (int $count, int $mask) => ($count * 2 < $bitCount->entryCount) * $mask;

    return array_sum(array_map($c, $bitCount->counts, masks(12)));
}

Feel free to factor the comparison function out to a parameter and use a single compute method if you prefer. Dealer's choice on that one.

The dumbest diagnostics ever

Part 2 asks us to find two additional values, the O2 rating and CO2 rating. To do that, we need to only consider numbers that were in the majority/minority when considering the previous position. At first, this sounds like a filter operation. However, we need to reduce the list as we go and stop, potentially early, when there's only a single number left in the list, even if we haven't processed all 12 positions. That "majority number" is the result we want. Like I said, anyone who designs such a system in real life is fired. It's possible to do this with a modified type of reduction, but instead, let's talk about recursion.

Recursion is a common functional technique based on the idea of breaking a problem down into smaller pieces, and expressing each piece as "solving this part, then solving the rest." It's less used in imperative languages (such as PHP) because, as an implementation detail, it can result in overwhelming the stack and causing memory issues. In this case, though, we know there will be at most 12 recursion layers, so that's acceptable.

Preparing the code

The first thing to note is that we're going to need to run our bit counting process several times, once for each column, because the counts will be different after numbers have been eliminated. So let's refactor our code a little bit:


$diags = pipe($inputFile,
    lines(...),
    amap(bindec(...)),
);

function bitCounter(array $diags, $masks): BitCount
{
    $bitCounter = static fn(BitCount $counter, int $next): BitCount => countBits($counter, $next, $masks);
    $initalCounts = array_fill(0, count($masks), 0);

    return pipe($diags,
        reduce(new BitCount(counts: $initalCounts), $bitCounter),
    );
}

First, we just have an input-processing pipe, to produce $daigs, for the list of diagnostic lines. Then we've created a utility function that wraps the counting process so we can reuse it. It's the same thing we saw before.

All for ratings

Next, we're told there's rating mechanisms, one for O2 and one for CO2. That is a hint that the process will be the same... except for one piece we swap out. Let's go ahead and make those pieces first.

For O2, we want a 1 if a given position has mostly 1s, and a 0 if it's mostly 0s. This is very similar to the gamma and epsilon computations. For CO2, it's the other way around. These should look very familiar:

function oxCriteria(BitCount $bitCount, int $position): int
{
    $oneIsMoreCommon = $bitCount->counts[$position] * 2 >= $bitCount->entryCount;
    return $oneIsMoreCommon ? 1 : 0;
}

function coCriteria(BitCount $bitCount, int $position): int
{
    $oneIsLessCommon = $bitCount->counts[$position] * 2 < $bitCount->entryCount;
    return $oneIsLessCommon ? 1 : 0;
}

(They could also be simple one liners, but I left in the variable name for documentation.)

Find your bits

Next, we need a way to find the bit at a given position in a number. Rather than use a mask for this bit, we'll take advantage of another fact of bitwise operations (just for variety's sake). If you left-shift a number by $position and then AND it with 1, the resulting number will be either 1 or 0, depending on if the original number had a 1 or 0 in that position. Remember, however, that our positions are backwards (left to right rather than right to left), so we have to invert the position number. To make it easier, make that a function:

function atPosition($number, $position): int
{
    return $number >> (11 - $position) & 1;
}

And do it again

We now have all our pieces, so we can assemble them. This is the recursive part.

What we want to do is compute the counts for the left most position, then filter out numbers that don't match the criteria, then do the same for the second column, etc. until we have only one number left. That one remaining number is the result we want.

Any recursive function needs to have a stop condition, that is, it knows when to stop calling itself. In our case, that's when the number of remaining values is 1, we just return that one value. That's easy enough:

    if (count($diags) === 1) {
        return current($diags);
    }

If that's not the case, we need to filter and recurse. It's easier to show first then explain (and it took me a while to wrap my head around as well):

function findGas(array $masks, callable $criteria, array $diags, int $position = 0): int
{
    if (count($diags) === 1) {
        return current($diags);
    }

    $bitCount = bitCounter($diags, $masks);

    $check = $criteria($bitCount, $position);

    $filter = static fn ($number) => $check === atPosition($number, $position);

    $filtered = afilter($filter)($diags);
    return findGas($masks, $criteria, $filtered, $position + 1);
}

The first block is our end-check. Then, we compute the counts for the current dataset. (Technically this will also compute the counts for the future columns that are not relevant, which is wasteful. Solving that is left as an exercise to the reader.) Then, we pass the counts and the position we want to check to the $critera, which is one of the oxCriteria() or coCriteria() functions we wrote earlier. That gets us a 1 or 0.

Next, we want to filter the list based to only those numbers that have a 1 or 0 (as returned by the $criteria) at the current position. Thanks to our atPosition() function, that's a simple comparison that we can toss into a filter operation.

Finally, we call findGas() again, with the filtered list of values and moving to the next position. If we're down to a single value, it will bottom out, return that value, and the stack will unwind all the way back to our call.

Our calls, by the way, look like this:

$o2level = findGas($masks, oxCriteria(...), $diags);
$co2level = findGas($masks, coCriteria(...), $diags);

Conclusion

That was kind of gross, but mostly because of bits, not functions. All of the actual functions we've produced are, in fact, conceptually super simple. The only complication is that we're computing extra counts in the second part, which could be solved by breaking up the bit counter even further so we can count a single column. Although bit manipulation is not often thought of as a case for functional code, it's entirely possible, and we've seen here, the resulting code is overall quite simple (aside from needing to grok bitwise math, which is its own topic entirely).

The full code is, of course, available on GitHub.

And seriously, never design a system like this. Ever.


Want to know more about functional programming and PHP? Read the whole book on the topic: Thinking Functionally in PHP.


Thinking Functionally in PHP