Advent of Functional PHP: Day 8

in PHP11 months ago (edited)

Advent of Code Day 8 was, to put it mildly, a pain in the ass. There's a couple of reasons for that. It's a naturally tricky problem, it's hard to genericize, and it's explained fairly badly. It took a while but with some help from others I was finally able to figure out (and refactor to) a good, functional solution to it. So let's dive in.

The problem boils down to one of encryption. Our input is several lines that all look like this:

acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab | cdfeb fcadb cdfeb cdbaf

Where each letter corresponds to one segment in an LED display for a number. Each number appears once on the left side, which is enough for you to figure out what letter corresponds to what segment. Then we need to use that knowledge to decode the numbers on the right and figure out what the number is.

The hint

Part 1 is kind of a practice round. It points out that some digits have a unique number of segments. For instance, the digit 7 is the only number that uses 3 lit segments. So if we see "dab" on the left side, we know that must be number 7, and thus d, a, and b correspond to those three segments (but we don't know which is which). The digits 1, 4, 7, and 8 are the ones with a unique number of segments, and thus easy to figure out. Part 1 just wants a count of the number of easy digits.

Parse it

Each line in the input is its own independent puzzle. The intent is to solve the puzzle over a zillion possible input sets. As usual, we'll start with parsing the input into a structure we can use. Since each line is its own independent puzzle, it makes sense to have a Line structure.

class Line
{
    public function __construct(
        public readonly array $in,
        public readonly array $out,
    ) {}

    public static function fromInput(array $args): static
    {
        return new static($args[0], $args[1]);
    }
}

Where $in is an array of the chunks on the left, and $out is an array of chunks on the right. The static method is simply an alternate constructor so that we can more easily pipe() data into it.

The parsing then looks like this:

function parseLine(string $line): Line
{
    return pipe($line,
        explode(' | '),
        amap(explode(' ')),
        amap(afilter()),
        Line::fromInput(...),
    );
}

It looks a bit involved, but it's really not. First, we take the line and split it on the | character to get a 2 element array, the $in side and the $out side. Then we explode both the left and right sides again on spaces. (The amap() call applies the explode(' ') call to every element, which turns each string into an array, giving us an array of two arrays.) Then we run a filter operation on both sides to remove any stray whitespace, and pass the result (a pair of arrays) into fromInput() to give us a Line object.

Find the easy bits

For part 1, we only care about the easy stuff on the right side. That means we can discard all the left side bits. (Don't worry, this parsing will still be helpful in part 2.) That means we can extract the $out parameter from each line, which is a simple map operation:

pipe($inputFile,
    lines(...),
    amap(parseLine(...)),
    amap(static fn (Line $l): array => $l->out),
    );

That gives us an array of arrays, where each subarray is the 4 outputs (right side bits) from that line. Part 1 specifically asks for how many easy numbers (1, 4, 7, 8) there are across the whole input. That means we don't care that they came from separate lines! The easiest way to throw out that information we don't care about is to flatten() our 2D array into a 1D array. Toss that onto the end of the pipe.

Next, we don't care about the actual letter values. We just care how many characters they are, aka how many LED segments are lit. So we can map over the whole list of values with strlen() to get a long list of numbers, each one being the number of segments lit for that character string. Another step in the pipe.

Next, we need to know which segment count numbers correspond to what actual digits in the display. That sounds like a lookup table we're going to need later, so let's go ahead and make it a function.

function countPossibilities(int $in): array
{
    // 0 and 1 are not possible, so ignore those.
    return match ($in) {
        2 => [1],
        3 => [7],
        4 => [4],
        5 => [2, 3, 5],
        6 => [0, 6, 9],
        7 => [8],
    };
}

That is, if there are 3 segments lit, it can only mean a digit 7. If there are 6 segments lit, that could be any of 0, 6, or 9. And so on.

We can now map over our list of character counts and turn those counts into possible digits. amap(countPossibilities(...)). What we have now is an array of digits that each item could be.

For now, we care only about the easy ones that have only one possibility. That means we can filter that list to show only those items that have one possible digit. And then we want to know how many of those there are. Put all together, our final pipeline looks like this:

$number = pipe($inputFile,
    lines(...),
    amap(parseLine(...)),
    amap(static fn (Line $l): array => $l->out),
    flatten(...),
    amap(strlen(...)),
    amap(countPossibilities(...)),
    afilter(static fn (array $possibilities): bool => count($possibilities) === 1),
    count(...),
);

Or in English:

  • Take the input file name.
  • Read it into a list of string lines.
  • parse each one into a Line object.
  • Take just the $out part of the line.
  • Flatten that into a single list of patterns.
  • Get the length of each one.
  • Get a list of what digit each pattern could be.
  • Filter down to just those that have only one option.
  • See how many of those we have.

Part 1 wasn't so bad. It's part 2 that is really annoying.

Decode all the things

Part 2, predictably, asks us to now build the algorithm to use the input information to figure out the decoding mechanism, and then decode the values on the right. It's tricky, in part because the problem was not well explained and in part because it's actually a very problem-specific algorithm.

I initially tried to come up with a generic approach that would work for arbitrary similar problems, based on the idea of determining which letter corresponded to which segment. I tried several different approaches for that, including things with complex enums, none of which worked, until someone else pointed out to me that you don't actually need to know what letter corresponds to which segment. It's possible to solve without that. The key insight is that the input order doesn't matter. That is, beca is exactly the same asabce is exactly the same as ebac. While keeping it as letters is helpful for debugging, the inputs are really isomorphic (that is, logically equivalent to) a bitmask. It's just a collection of the 7 possible inputs, and which ones are included or not.

That means if we convert the values to bitmasks, we gain access to various bitwise techniques that are easier to use and fast, if potentially harder to debug by print statement. We'll need a couple of utilities for that, so let's go and build those.

First, we need to map each letter to a bit.

function letterToMask(string $l): int
{
    return match($l) {
        'a' => 2**0,
        'b' => 2**1,
        'c' => 2**2,
        'd' => 2**3,
        'e' => 2**4,
        'f' => 2**5,
        'g' => 2**6,
    };
}

The ** is "to the power of" in PHP. Next, we need to convert a string of letters into a bitmask of all of those letters, regardless of order.

function strToBitmap(string $s): int
{
    return pipe($s,
        str_split(...),
        reduce(0, static fn(int $mask, string $letter): int => $mask | letterToMask($letter) ),
    );
}

First, we split the string into an array of characters. Then we reduce over the list of letters, and for each one we get its bit version and boolean OR them together. That is, ac and ca both turn into 0b101. (Which is technically the integer 5 in base 10, but that is completely irrelevant to us at the moment.)

Next, since we know that's how we're going to want the data, let's convert the incoming data from strings to bitmasks in the Line object. We can do that in the constructor, with a little utility.

class Line
{
    use Evolvable;

    public readonly array $mapping;

    public readonly array $inputMasks;

    public function __construct(
        public readonly array $in,
        public readonly array $test,
    ) {
        $this->mapping = [];
        $range = range(2, 7);
        $this->inputMasks = array_combine(
            $range,
            array_map($this->makeMapList(...), $range)
        );
    }

    protected function makeMapList(int $segmentCount): array
    {
        return pipe($this->in,
            afilter(fn(string $s): bool => strlen($s) === $segmentCount),
            amap(strToBitmap(...)),
            fn(array $map): array => array_combine($map, $map),
        );
    }

    public static function fromInput(array $args): static
    {
        return new static($args[0], $args[1]);
    }
}

We'll explain that inside out. makeMapList() takes one argument, the segment count, and finds all of the left-side inputs that are that long. It then converts all of those from strings to bitmaps, using the tools we just built. Finally, it uses array_combine() to make it a map from the bitmask to itself. (That will be useful later.)

Then in the constructor, we map over the possible segment counts (2 to 7), and for each one we make a lookup table from the segment count to the map list.

All of that is a bit screwy to think about, but the net result is that we take the input list of strings and turn it into a lookup table similar to the one from countPossibilities(), but with the bitmask version of each string. That data structure will be very helpful shortly when we go to derive what mask is what digit. (The algorithm came first, then I built up the map to support it. It's just a bit easier to explain the other way around.)

There's one more tool we're going to need, and that's a function to count the number of 1 bits in a bitmask. (We'll see where that comes in momentarily.) The purely functional way to do that would involve recursion, which is usually less efficient in PHP, so we'll just do it the procedural way (which is easily found by Googling):

function countBits(int $n): int
{
    $count = 0;
    while ($n)
    {
        $count += $n & 1;
        $n >>= 1;
    }
    return $count;
}

Technically that involves mutable variables because it changes both $n and $count, but since integers pass by value in PHP none of that mutation leaks out of the function, so... meh.

Now we can finally start solving our puzzle.

Know the problem space

Given a Line, we want to derive the mapping table from a bitmask to the number it means. We're storing the mapping on the Line object, so it's a function from Line to Line.

We'll start with the easy bits (pun intended). We know whichever input mask has only 2 bits set is the number 1. That means from the $inputMasks table we built in the Line constructor, we can very easily find the mask that corresponds to 1. The same is true of 4, 7, and 8.

function deriveMapping(Line $line): Line
{
    // These all have only one possible translation, so do the easy bits.
    $mappings[1] = key($line->inputMasks[2]);
    $mappings[7] = key($line->inputMasks[3]);
    $mappings[4] = key($line->inputMasks[4]);
    $mappings[8] = key($line->inputMasks[7]);

    return $line->with(mapping: array_flip($mappings));
}

$line->inputMasks[2] is a single item array from the bitmask of the 2-element input to itself. That's where the array_combine() from earlier comes in, as key() will now return the key of the first (only) element of the array, which is that value. So that single fast lookup now gives us the map from 1 to the bitmap it corresponds to. We can do the same for 4, 7, and 8.

Finally, at the end we have a lookup map from a digit to a bitmap, but what we actually want is the other way around. That's easy enough: array_flip() switches the keys and arrays, and we then return the Line now populated with that lookup table.

The hard bits

Wonderful! That gives us four values, but there's six more to figure out. Here's where deep knowledge of the specific problem space comes in, as well as some non-obvious insights. (Non-obvious in that someone else pointed them out to me, so if you don't think you'd figure this out on your own, don't worry; neither did I.)

Specifically, if you look at the digits for 1 and 6, they have some segments in common, some that are in only one or the other, and some that are in neither. If we use a bitwise XOR operation (^), we can get back just the bits that are in one or the other digit but not both. And then we can use countBits() to see how many segments were in one or the other but not both. That ends up being 6 segments.

That... seems like a not very useful thing to do. It's certainly non-obvious. However, compare that to applying the same operation to the mask for the digits 9 and 0. Those have a different number of segments that are XOR with the digit 1. Only 6 has 6 XOR segments. Thus, if we XOR the 1 bitmask (which we know) with the masks for 0, 6, and 9 (which all have the same number of segments), then we know whichever one has 6 bits in its result must be the digit 6.

Go ahead and read those last three paragraphs again. It's complicated and took me a while to wrap my brain around, too. It is also largely irrelevant to functional vs. procedural code, which is what we're mainly interested in for this series.

Translating that logic to code, we get this:

function deriveMapping(Line $line): Line
{
    // These all have only one possible translation, so do the easy bits.
    $mappings[1] = key($line->inputMasks[2]);
    $mappings[4] = key($line->inputMasks[4]);
    $mappings[7] = key($line->inputMasks[3]);
    $mappings[8] = key($line->inputMasks[7]);

    $mappings[6] = pipe($line->inputMasks[6],
        amap(compose(
            fn(int $mask): int => $mask ^ $mappings[1],
            countBits(...),
        )),
        afilter(fn(int $count): bool => $count === 6),
        key(...),
    );

    return $line->with(mapping: array_flip($mappings));
}

Let's break that down. Starting with the input values that are 6 segments (an array of the masks for 0, 6, and 9), we take each one and XOR it with the mask for 1, then count the remaining bits. That gives us an associative array of the original bitmasks to their remainder. We then filter that list to include just the one with a remainder of 6. key() then gets us the key of that one value, which is the mask that logically must be the digit 6.

Of note here, we could also have written two separate amap() calls. When dealing with pure functions, composing functions together and then mapping over the resulting function has the same effect as mapping over each function separately. Essentially, it is distributive. Most of the time, composing first will be more performant as it only needs to iterate once instead of several times, but the cost of the operations applied to each item are the same. How much of a difference it makes is going to vary case by case.

Oof. We're going to need to use the same logic and analysis to solve the other digits, so let's factor that routine out to another function before we go further.

function findByMask(array $masks, int $xorWith, int $remainder): int
{
   return pipe($masks,
       amap(compose(
           fn(int $mask): int => $mask ^ $xorWith,
           countBits(...),
       )),
       afilter(fn(int $count): bool => $count === $remainder),
       key(...),
   );
}

That's the same code as above, just with the digits replaced by variables. Given an array of masks we want to filter, XOR-and-count them with the provided known number's mask, then check for a known remainder.

How do we know what number to XOR against and what the remainder should be? That requires staring at the LED layout for a while with a pencil and paper and counting it up by hand. I will spare you the details of that, and offer just the final version of the function:

function deriveMapping(Line $line): Line
{
   // These all have only one possible translation, so do the easy bits.
   $mappings[1] = key($line->inputMasks[2]);
   $mappings[4] = key($line->inputMasks[4]);
   $mappings[7] = key($line->inputMasks[3]);
   $mappings[8] = key($line->inputMasks[7]);

   $mappings[6] = findByMask($line->inputMasks[6], $mappings[1], 6);
   $mappings[2] = findByMask($line->inputMasks[5], $mappings[4], 5);
   $mappings[3] = findByMask($line->inputMasks[5], $mappings[1], 3);
   $mappings[5] = findByMask($line->inputMasks[5], $mappings[6], 1);
   $mappings[9] = findByMask($line->inputMasks[6], $mappings[3], 1);

   $mappings[0] = key(array_diff($line->inputMasks[6], $mappings));

   return $line->with(mapping: array_flip($mappings));
}

0 ends up being a bit more complicated to find a good comparison for, but thankfully that's unnecessary. Once we know what 6 and 9 are, then by process of elimination whatever the other item in the 6-bit list is must be 0. (That's what the array_diff() line is all about.)

We now finally have a function that takes a Line and returns a Line with the corresponding lookup table to decode any digit. Huzzah!

The home stretch

We're nearly done. Now that we have the lookup table, we need to use it to figure out what the right-side numbers are, then interpret that as a 4-digit number. Then we're asked to add up all of those 4 digit numbers to get the final result. That's three tasks, so three functions.

First, we need to decode a Line into a collection of digits.

function decode(Line $line): array
{
   return pipe($line->test,
       amap(strToBitmap(...)),
       amap(fn(int $test): int => $line->mapping[$test]),
   );
}

That takes the right-side values (an array), converts each one to a bitmap, then looks up that bitmap in the mapping table. The result is an array of numbers. Nice and simple.

Second, we need to convert that list to a single 4-digit integer. That's almost easy, using the same power-trick we saw before, but note that the digits are listed left to right. That means we need to reverse them first before we raise them to a power so that we can get the right power.

function digitsToNumber(array $digits): int
{
   return pipe($digits,
       array_reverse(...),
       amapWithKeys(static fn (int $v, int $k): int => $v * 10**$k),
       array_sum(...),
   );
}

array_reverse() turns [0 => 3, 1 => 9, 2 => 5, 3 => 7] (left to right) into [0 => 7, 1 => 5, 2 => 9, 3 => 3] (right to left). Then the array keys are the power of ten to which the digit corresponds. So we multiply each digit to its key's power of 10, and add them up. Or in other words, we get 7 + 50 + 900 + 3000 = 3957. And that's the readout for that line.

And finally, we just want to add all of those final line results up. Here's the full pipeline:

$results = pipe($inputFile,
   lines(...),
   amap(compose(
       parseLine(...),
       deriveMapping(...),
       decode(...),
       digitsToNumber(...),
   )),
   array_sum(...),
);

Note here again that we're composing the steps to apply to each line first, and then mapping that resulting function over the whole incoming dataset. Finally, just array_sum() everything together.

Whew! That was quite a trek. The functional bits weren't all that complicated. The problem itself, though, was just annoying. As usual, the final code is available on GitHub.

I'm not sure how many more of these I will make it through, but I have another two puzzles already solved so I will at least get through those. We'll see how long I last. If you've been getting something out of this series, please let me know! The more value people are getting out of it, the more worth it for me it is to keep doing it.


Want to know more about functional programming and PHP? Read the whole book on the topic: Thinking Functionally in PHP.


Thinking Functionally in PHP

Sort:  

Your content has been voted as a part of Encouragement program. Keep up the good work!

Use Ecency daily to boost your growth on platform!

Support Ecency
Vote for new Proposal
Delegate HP and earn more