Advent of Functional PHP: Day 10

in #php2 years ago

For the 10th Day of Advent of Code, we're asked to solve a matching braces problem. This is a common parser exercise, but it's made a bit more complex in this case by using multiple types of braces. Specifically, we're handling a series of lines that contain ( and ), but also < and >, [ and ], and { and }.

The story jazzes it up as being the code of our submarine's navigational computer, which consists entirely of braces in a sort of eldritch horror version of brainfuck, but that's mostly just a distraction.

We're told up front that most of the data lines are bad. None, in fact, are properly balanced. Some are corrupted by having an invalid character (say, a ] closing when a } closing was expected), while others are just incomplete. (Seriously, who wrote this submarine's computer system?)

The instructions say to ignore incomplete lines for part one and consider only corrupted lines. It's a safe bet that part two is going to have us consider the incomplete lines (spoiler alert: it does), so let's save time and consider both possibilities. The problem, then, is to find lines that are corrupted and find their first illegal character.

Then we want to turn the character into a point count and add those up, again mostly just to prove that we did it with a single final answer. Let's have a go at it.

It's recursion all the way down

A braces matching problem is almost always solved with a stack. Stacks are, usually, a mutable data structure, which doesn't fit with functional code. I tried a few times to use the function call stack itself to store values that we've already seen (so we can match against them), but that turned out to not work. (If someone did manage to get that approach to work, I'd be curious to see how.) Instead, I again took a cue from Elixir guy José Valim and recursed with a stack instead. Because of PHP's copy on write semantics, while this does use more memory than a single mutable array stack it doesn't use quite as much as you'd think.

José's Elixir solution really shows off Elixir's flexibility thanks to pattern matching and decomposition. I recommend watching the video for it so you can marvel at how elegant pattern matching functions can be. PHP, unfortunately, lacks such functionality, so instead we need to use a match() statement and a few extra conditionals. It's not quite as elegant, but it's close.

Enumerated possibilities

There are three possible states that a line could be in: OK, Corrupted, and Incomplete. That naturally suggests to me that we should toss an Enumeration in there. Enums are new in PHP 8.1, and let you define a new type that has a finite list of possible named values. It's basic form is like so:

enum Result
{
    case OK;
    case Corrupted;
    case Incomplete;
}

What PHP still lacks, unfortunately, is Algebraic Data Types, which in practice end up meaning "Enums with attached data". You can sort of emulate that with subclasses, but not perfectly as you can still have a nominally infinite number of subclasses. That's not a big problem for part 1, but it will be a bit problematic in part 2, as we'll see.

It's really hard to explain the logic of the parser without an example, so let's jump straight to the code and then go through it.

function parse(string $line, $pos = 0, array $stack = []): Result|string
{
    $next = $line[$pos] ?? null;
    $head = $stack[0] ?? null;

    return match ($next) {
        // Opening brace, push an "expected" onto the stack.
        '{' => parse($line, $pos + 1, ['}', ...$stack]),
        '<' => parse($line, $pos + 1, ['>', ...$stack]),
        '(' => parse($line, $pos + 1, [')', ...$stack]),
        '[' => parse($line, $pos + 1, [']', ...$stack]),
        '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : $next,
        null => count($stack) ? Result::Incomplete : Result::OK,
    };
}

parse() is a recursive function. It takes a line being processed and a position. (We could also just slice values off of the string and pass a modified string along recursively, but that would use substantially more memory.) It also takes a stack, which is an array. That stack tracks not the characters that we have already seen (as the call stack would have), but the characters we expect to see. That is, if the last character we saw was a {, then we would expect to see a } next, or more opening braces.

The first two lines are just simplifications to make the rest of the function easier. $next is the next character in the input, or null if we ran off the end of the string. $head is just a convenient alias for the top of the stack, that is, the closing character we would expect to see next, if it's a closing character. Everything else is part of the match() statement, which is as close as PHP 8.1 gets to pattern matching. (Which is not that close, to be honest.)

If the next character is an open brace, we simply call parse() again recursively, with the corresponding closing brace pushed onto the stack.

If the next character is null (the default we set in the first line), that means we ran out of input. If the stack is empty, that means every brace matched and all is OK. If not, it means the string is incomplete. Both of those have enumerated possibilities that we can return.

If the next character is a closing brace, we check to see if it matches the closing brace we expect (aka, $head). If so, array_slice() pops that value off the stack and we advance to the next character, as al is well. If not, then whatever the next character is we know is invalid.

This is where ADTs would come in handy. Ideally, we could return Result::Corrupted($next). That would be a parameterized return, such that the return value is always an instance of Result with potentially context added. PHP doesn't support that yet, sadly, so instead we use a union type return. This is a trick for a sort of "junior result type". In a sense, it's a more advanced version of a nullable return. We could return a string or null to indicate an unexpected situation, but null is a very non-descriptive sentinel value. By returning a common type or an Enum, we can use the values of the enum as a custom sentinel value that we know has meaning only in our domain model. It's a sort of personalized, custom null that conveys slightly more information than null does.

As a result (no pun intended), there are three possible outcomes of parsing a string:

  1. The recursion stack builds up across the whole list of characters in the string, until the last character processed returns Result::OK. Then the whole call stack returns in one long collapsing line like dominoes. The closing-brace $stack builds up and gets torn down throughout that process, and in the last call is empty.
  2. The recursion stack builds up across the whole list of characters in the string, until the last character processed has a still-filled stack, so it returns Result::Incomplete. Then the whole call stack returns in one long collapsing line like dominoes.
  3. The recursion stack builds up across the whole list of characters in the string, until a character doesn't match the expected value from the stack and it returns that mismatched character. Then the whole call stack returns in one long collapsing line like dominoes. The closing-brace $stack builds up and gets torn down throughout that process, and in the last call is non-empty but still gets garbage collected as the call stack unrolls.

Once that's done, the rest of the problem is mundane. We need to filter to just the corrupted lines, which are the ones that returned a string. Then we have a list of corrupted characters, which we convert to "score" values according to a hard coded table, then add them all up to get our final result.

The score translation looks like this:

function score(string $char): int
{
    return match ($char) {
        ')' => 3,
        ']' => 57,
        '}' => 1197,
        '>' => 25137,
    };
}

(Why those numbers? Because the problem said so. I don't understand them either.) And then our whole pipeline looks like this:

$score = pipe($inputFile,
    lines(...),
    amap(parse(...)),
    afilter(is_string(...)),
    amap(score(...)),
    array_sum(...),
);

Baddaboom, baddabing, and we have our answer.

Part 2 completes me

One advantage of implementing the solution this way is that it gives us a list of the characters that we expect to find to finish off the string at any given point in time. It's the $stack. That's fortuitous, because part 2 asks us to find the characters that will successfully complete an incomplete string.

Hey, we already have that! We just need to return it. That means we need to make only a very slight change to parse():

function parse(string $line, $pos = 0, array $stack = []): Result|string|array
{
    $next = $line[$pos] ?? null;
    $head = $stack[0] ?? null;

    return match ($next) {
        '{' => parse($line, $pos + 1, ['}', ...$stack]),
        '<' => parse($line, $pos + 1, ['>', ...$stack]),
        '(' => parse($line, $pos + 1, [')', ...$stack]),
        '[' => parse($line, $pos + 1, [']', ...$stack]),
        '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : $next,
        null => count($stack) ? $stack : Result::OK,
    };
}

First, we changed the return type to be Result|string|array. Second, instead of returning Result::Incomplete, we return the stack. Done. Press the "that was easy" button.

Is that your final score?

The final result calculation is a somewhat different and utterly bonkers computation. Since our output for an incomplete line is the completing characters, the score for a line is based on the accumulated score of those characters. Specifically, the score for a given character is 5 times the previous character's score, plus an additional amount depending on the closing character. (Now they're just making busy work.) Fortunately that is a very simple reduce() operation.

function scoreRemainingCharacter(int $score, string $char): int
{
    return $score * 5 + match ($char) {
        ')' => 1,
        ']' => 2,
        '}' => 3,
        '>' => 4,
    };
}

function scoreRemaining(array $stack): int
{
    return reduce(0, scoreRemainingCharacter(...))($stack);
}

Starting from 0, we reduce across the list of remaining characters. For each character, the new score is 5 times the previous score (which is 0 in the first case) plus whatever int value the match() finds.

Finally... we need the median score of the lines, but we're promised that there will always be an odd number of lines to make it slightly easier. That means we need to sort the values (once again, wrapping the sort in a function so that it becomes pipe-able), and get the middle value. Our end result then looks like this:

function imSort(array $a): array
{
    sort($a);
    return $a;
}

$score = pipe($inputFile,
    lines(...),
    itmap(parse(...)),
    itfilter(is_array(...)),
    itmap(scoreRemaining(...)),
    collect(),
    imSort(...),
    fn(array $a) => $a[floor(count($a)/2)],
);

Of note, I switched the pipeline to be all lazy iterators this time, mainly to reduce memory usage. (All the a*() functions are now it*(), which is how I differentiate them in Crell/fp). However, imSort() requires an actual array, so a simple utility named collect() wraps iterator_to_array() to handle both array and iterator cases:

function collect(): callable
{
    return static fn(iterable $a): array
        => is_array($a) ? $a : iterator_to_array($a);
}

(The name is borrowed from Rust, which has a similar method to turn an iteration into a list.) The median-finding is a simple algorithm we know is safe thanks to the guaranteed odd number. (I think the Christmas Elves writing these problems finally took pity on us.)

Further analysis

Before we close, there's a few other things I want to discuss.

Tagged unions

First is the enum. In practice, it didn't offer that much for us in this problem. Other, similar problems might have, but the lack of associated data does limit their usefulness. I will talk more about their use as custom sentinel values another time.

If we wanted to emulate that with classes, we could do so like this:

interface Result {}

class OK implements Result {}

class Corrupted implements Result
{
    public function __construct(public readonly string $char) {}
}

class Incomplete implements Result
{
    public function __construct(public readonly array $missing) {}
}

And then modify parse() like so:

function parse(string $line, $pos = 0, array $stack = []): Result
{
    $next = $line[$pos] ?? null;
    $head = $stack[0] ?? null;

    return match ($next) {
        '{' => parse($line, $pos + 1, ['}', ...$stack]),
        '<' => parse($line, $pos + 1, ['>', ...$stack]),
        '(' => parse($line, $pos + 1, [')', ...$stack]),
        '[' => parse($line, $pos + 1, [']', ...$stack]),
        '}', '>', ')', ']' => $next === $head ? parse($line, $pos + 1, array_slice($stack, 1)) : new Corrupted($next),
        null => count($stack) ? new Incomplete($stack) : new OK(),
    };
}

That would provide more type robustness, at the cost of a bit more setup. The limitation is that we cannot prevent other classes from also implementing Result, so we can't be certain that those are the only three possibilities. The syntax for an enum-based ADT is also a bit nicer as it avoids all the new keywords. That would also then change how we filter the list down to just the cases we care about (it would be an instanceof check instead of is_string()/is_array()), and we would then need to "unwrap" the values before continuing. Whether that extra step is worth it depends on your specific situation and the tools you have on hand (or have written, which is exactly what I have been doing for Crell/fp).

Avoiding recursion

Recursion is a logically very powerful concept. Once you understand the problem space, a recursive algorithm, as we've seen in the last two puzzles, can end up being very short and powerful, and even fairly easy to understand. The problem is the call stack. PHP has a limited call stack. I believe it defaults to 256 in current versions, although it has varied over the years and is configurable. A recursive algorithm, by design, uses up a lot of call stack slots. If we were dealing with lines that were 1000 characters long, the code would break when it hit the max stack size.

This problem is by no means unique to PHP. It's common to any call-stack-based language, which includes most languages that are not designed for recursion and functional programming explicitly. However, this particular recursive function has a special property: It is "tail recursive." That is, the recursive call is always the absolute last operation performed, if it recurses at all. (The recursion is at the "tail" of the function.) Tail recursion is useful because there are standard, known ways to optimize tail recursion. In fact, there's several.

Some languages can recognize tail recursion and remove layers from the stack after the next call begins. That way, the algorithm is still recursive but there is no risk of overflowing the call stack. For an absolutely fabulous explanation of that approach in JavaScript, I highly recommend Tail Call Optimization: The Musical. Yes, really. It's 12 minutes of your life very well spent.

Another fun fact is that any tail-recursive algorithm is isomorphic to (that is, "logically equivalent to and can be refactored into") a reduction. Yes, really. (I've seen people claim that all recursion can be converted to a reduction; I am not sure if that's true, but it's certainly more difficult to do generically than with just tail recursion.) That means if you can write a tail recursive algorithm, then guaranteed you can, with a bit of work, refactor it to a reduce-based algorithm, which never has a concern about overflowing the stack.

I have seen formal algorithms that claim to re-structure an arbitrary tail recursive algorithm to a reduction, but they are highly involved. Instead, I want to offer a reasonable heuristic for doing so manually that I have found "good enough" for cases where it's needed. (That is, when I'm doing it myself rather than trying to have a program do it for me.)

  1. Whatever aspect of your problem space you are recursing on, convert that to an iterable (array).
  2. Whatever other arguments you're passing recursively, convert to an accumulator. Depending on the use case that could be a primitive, an array, or an object.
  3. If your recursion can end early rather than running to the end of the input, wrap that base case check into a separate function and use reduceUntil() instead of plain reduce().
  4. Convert your recursive function signature to a reducing function signature.
  5. Replace recursive calls with return statements.
  6. Fiddle with what's left to make it line up properly.

Step 6 is a bit squishy, which is what makes it a heuristic rather than an algorithm. :-) I was able to convert the part 2 version of parse() into a reduce-based version in about 10-15 minutes, including fiddling time. Here's the final result:

class Parse
{
    use Evolvable;

    public readonly string $badchar;
    public readonly array $stack;
    public readonly bool $done;

    public function __construct()
    {
        $this->stack = [];
        $this->done = false;
    }

    public function status(): Result
    {
        return match (true) {
            isset($this->badchar) => Result::Corrupted,
            empty($this->stack) => Result::OK,
            default => Result::Incomplete,
        };
    }
}

function parse(Parse $parse, string $next): Parse
{
    $head = $parse->stack[0] ?? null;

    return match ($next) {
        '{' => $parse->with(stack: ['}', ...$parse->stack]),
        '<' => $parse->with(stack: ['>', ...$parse->stack]),
        '(' => $parse->with(stack: [')', ...$parse->stack]),
        '[' => $parse->with(stack: [']', ...$parse->stack]),
        '}', '>', ')', ']' => $next === $head
            ? $parse->with(stack: array_slice($parse->stack, 1))
            : $parse->with(done: true, badchar: $next),
    };
}

$score = pipe($inputFile,
    lines(...),
    itmap(str_split(...)),
    itmap(reduceUntil(new Parse(), parse(...), fn (Parse $p): bool => $p->done)),
    itfilter(fn (Parse $p): bool => $p->status() === Result::Incomplete),
    itmap(fn (Parse $p): array => $p->stack),
    itmap(scoreRemaining(...)),
    collect(),
    imSort(...),
    fn(array $a) => $a[floor(count($a)/2)],
);

First, instead of reading characters on the line string we str_split() the line into an array that we can reduce over. Then we convert the stack to an object of type Parse. In some cases just passing the stack array through would work, but in this case we need to be able to set a flag we can check in reduceUntil() to stop on invalid characters, and we need to record the invalid character itself. The status() method leverages the enum we built before, which is always nice to see.

The parse() function has changed a bit, but not drastically. It's still the same algorithm. $next is now passed to us on each step of the reduction, so we don't need to get it ourselves. Then instead of calling the function recursively every time we hit an opening brace, we return a new Parse object with the new closing brace pushed onto its stack. Same idea, different wrappers.

It gets a bit more interesting for the closing braces. If the brace matches, we return a Parse the popped stack. If not, we explicitly set a $done flag and record the bad character. (We could also technically just check for the existence of $badchar, but I decided to be explicit. Dealer's choice.) If we run out the end of the string, we need to then check whether it was OK or Incomplete after the fact. That's what the status() method does for us.

The pipe gets a little bit longer, too, as we need the str_split() check before the reduce and we need to unpack the stack afterward. We could potentially clean it up a bit by taking the adjacent itmap() calls and composing those functions together first before mapping; as we discussed previously, that is usually more performant and gives the exact same result, but this post is long enough as is so I will leave that as an exercise for the reader.

We could also go a step further and move some functionality into methods of Parse, such as adding pure push() and pop() methods to make the code a bit tidier to read, pushing most of the with() calls inside the class. That is also left as an exercise for the reader.

The code for parts 1 and 2, including the alternate universe reduce-based version, are all available on GitHub.

Conclusion

This is my last entry in this year's Advent of Code. The later puzzles seem like they're more algorithmically hard than logistically hard, so they would take a long time to do and are unlikely to show off any more functional bits than we've already seen. I will have one more post in this series next week, looking over what we've learned and what we lessons we can take to the development of PHP itself.

The main take-away, though, is that functional programming in PHP is not only possible, it's increasingly plausible and ergonomic. It's also quite approachable. All we did was restrict ourselves to readonly properties on objects, pure functions, and introduce function concatenation. Everything else was just a natural outgrowth of those, often born from just taking what we would do in a procedural mindset (some combination of foreach() and if, mostly) and abstract those common patterns out to utility functions in a way that composes nicely.

And, and this is very important: There's not a monad in sight. :-) You can do a great deal in functional programming without invoking the Scary Word(tm) (it's not really scary; just read my book and you'll see how not-scary it is). It's fun, it produces easily debuggable, easily testable, easily reusable code. And it can still dovetail with many of the OOP techniques you are likely used to in PHP-land. Grab Crell/fp as a starting point, full of useful functional, pipe-friendly operations, including all of those discussed here.

I challenge the PHP developers in the room to try out functional-style PHP for themselves in the new year. See where taking functional approaches can make your code simpler, cleaner, more composable, and more reusable. Try piping functions or methods together rather than calling between them directly.

Yes, it will feel a bit weird at first. But once you get comfortable with pipes, filters, maps, and reduction, you'll wonder how you ever did anything else.

Merry Christmas, and a Functional New Year!


Want to know more about functional programming and PHP? Read the whole book on the topic: Thinking Functionally in PHP.


Thinking Functionally in PHP

Sort:  

Congratulations @crell! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You received more than 6000 upvotes.
Your next target is to reach 7000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Merry Christmas - Challenge Feedback - Win a 1000 HP delegation
Christmas Challenge - Offer a gift to to your friends
Support the HiveBuzz project. Vote for our proposal!

Congratulations @crell! You received a personal badge!

Happy Hive Birthday! You are on the Hive blockchain for 4 years!

You can view your badges on your board and compare yourself to others in the Ranking

Check out the last post from @hivebuzz:

Christmas Challenge - 1000 Hive Power Delegation Winner
Merry Christmas - Challenge Feedback - Win a 1000 HP delegation
Support the HiveBuzz project. Vote for our proposal!