Advent of Functional PHP: Day 2

in PHPlast year (edited)

In today's challenge, we're asked to interpret a series of basic command lines from a file and update the position of our submarine accordingly. It's basically a graph walk of sorts, with instructions of up, down, and forward. (Apparently you cannot go backward, as in life.)

As with yesterday's challenge, we could do it imperatively with a foreach() loop and a couple of state variables floating around, but that conflates a whole bunch of different behaviors into one blob of code. We don't want to do that, so let's step back and consider the problem more clearly.

The first thing to realize is that we're tracking some value over time, our position, which is getting modified by each instruction line in the file. Said the other way around, we're iterating over a list of commands and each one is updating the position, using the result of the previous instruction as a starting point. "Walking a list and updating a value each time" should trigger our functional brains to say "ah ha, a reduce operation!"

As noted yesterday (I didn't know what today's challenge would be, I swear), a reduce operation walks over a list and performs the same operation (function) on each item, using the output of the previous iteration as an input. It also goes by the name "fold" or "foldl" (fold left) in a lot of more academic posts; same thing. But it fits our use case exactly: Given a start position and a list of "change position" commands, apply each command to the position in turn, returning a new position each time.

Reducing the problem

PHP has a built-in array_reduce() function that works well enough, but it doesn't work on iterables. To handle that, and to show what array_reduce() does under the hood, let's write our own. This is the version out of Crell/fp:

function reduce(mixed $init, callable $c): callable
{
    return static function (iterable $it) use ($init, $c): mixed {
        if (is_array($it)) {
            return array_reduce($it, $c, $init);
        }
        foreach ($it as $v) {
            $init = $c($init, $v);
        }
        return $init;
    };
}

As with the filter operation we saw yesterday, it returns a new function that can be composed with other functions in a pipe. If the value being reduced is an array, we can fall back on the built-in array_reduce(), which should be a bit faster than doing it in user space. If not, we do the same logic ourselves in user space: Call the reducing function $c with the initial value, then call it again with the result, then call it again with that result, and so on. When done, return the result.

If that foreach() loop looks suspiciously familiar, it's because very nearly the same loop as we saw yesterday in pipe(). Fun fact: composing functions together can be implemented as a reduction across the list of functions! More specifically, it's reducing a list of functions where the reducing function (the thing you call with the result of each item in the list) just calls each function in the list turn with the result. That version of compose() would look like this:

function compose(callable ...$fns): callable
{
    return function ($x) use ($fns) {
        return reduce($fns, function($collect, $fn) {
            return $fn($collect);
        }, $x);
    };
}

Or if you wanted to use PHP's short-arrow syntax, it becomes a one-liner:

function compose(callable ...$fns): callable
{
    return fn($x) => reduce($fns, fn($collect, $fn) => $fn($collect), $x);
}

And then pipe() just calls compose() to get the combined function, then invokes it with the initial value. Pretty, elegant... and slower than just sticking the foreach() loop in pipe() directly. That's fine, and what I recommend doing; but it's still cool to realize that we are already doing reductions. (In fact, nearly all functional primitives can be implemented using reduction and recursion. It's fun to do as an exercise, but usually not as performant as giving them all their own foreach() loop.)

In any case, we now have a working reduction operation. What do we do with it?

Getting the data

We need to read the lines from the input file and interpret them as a commands of some kind. That word "and" implies we're dealing with, at least, two different functions. Let's consider them one at a time.

"Read lies of input from the file" sounds simple enough. We could do this in a couple of ways, including the file_get_contents() approach we used yesterday. That's fine, but just for fun let's do it lazily instead, using a generator. Given a file name, we want to open the file, produce lines one at a time via yield, and then close the file. Let's write a function that does exactly that:

function lines(string $file): iterable
{
    $fp = fopen($file, 'rb');

    while ($line = fgets($fp)) {
        yield trim($line);
    }

    fclose($fp);
}

This function gives us an output very similar to file_get_contents(), but lazily as an iterator rather than all at once as an array. There's a small performance cost for that, in return for a memory savings. Whether memory or CPU time is more imporant depends on your use case, but in this case it doesn't really matter, so let's go with this. As a fun bonus, we now have a reusable operation we can reuse anywhere we want a lazy-file_get_contents(). Score!

Note that we're trim()ing each line as we go. We could have made that a separate step in a composed pipe, but it doesn't hurt anything to do here and it's all still functional, and very slightly faster, so sure, let's do it here.

The second part is interpreting each line as a command step. The word "each" there is our next clue. We have a list of lines, and "each" line we want to interpret as a command step. That means we again have two pieces: Interpret one line as one command step, and then apply that operation to all lines. Imperatively, that's another foreach() loop, but that still conflates the iteration with the operation. We can do better.

Let's start with the "apply operation to every line" part. In functional-speak, that's called a map() operation. PHP's array_map() function serves that purpose, but it once again only works on arrays, while we have iterables. So let's write our own, or rather have a look at what Crell/fp gives us:

function amap(callable $c): callable
{
    return static function (iterable $it) use ($c): array {
        if (is_array($it)) {
            return array_map($c, $it);
        }
        $result = [];
        foreach ($it as $k => $v) {
            $result[$k] = $c($v, $k);
        }
        return $result;
    };
}

function itmap(callable $c): callable
{
    return static function (iterable $it) use ($c): iterable {
        foreach ($it as $k => $v) {
            yield $k =>$c($v);
        }
    };
}

As with filter, we've written two versions, one for returning an array and one for returning a generator. They both do the same thing: They return a new function ready for composing that iterates over a list and applies some operation to each element of the list, and then returns or yields the result. It also ensures the keys of the list stay the same, in case it's not a list array (0-based sequential keys; what is a real array in every other language). In the special array-to-array case, it falls back on the built-in array_map() for performance.

Fun aside: Remember we said most operations could be implemented in terms of reduce()? That applies to map(), too.

function map_r(iterable $it, callable $fn): iterable
{
    return array_reduce(
        $it,
        fn($result, $val, $key) => $result + [$key => $fn($val)],
        []);
}

In this case, each step returns a new array that is the same as the old one but with one more element, after applying the operation to that element. Cool! Now don't use it, because the foreach() version is faster.

Great. So now we have a way to produce a list of command line steps, and a way to apply some operation to each step. Now... what is that operation?

Data modeling

Here's where the language you're working in begins to matter a ton. We want to turn a line of text into some meaningful representation. There are dozens of entirely viable versions of "meaningful representation," some of which are only available in certain languages and others are available but just clunky. The traditionally PHP-ish way to do so, a holdover from the PHP 4 era, is to turn each line into an associative array and just-kinda-know what that array structure means. We did that yesterday, with the "current and previous" pairs, essentially.

For anything more interesting than absolutely trivial cases, though, I would argue that is a trap. Using associative arrays (what is often called a dictionary or map in other languages) as though it were a formal data structure gets confusing and unwieldy extremely fast. Especially with the ergonomic improvements in PHP 8.0 and 8.1, a more formal data structure is simple, easy, and highly beneficial.

I would argue the ideal data structure for each step is a tagged union, or Algebraic Data Type. That is, essentially, an enum (new in PHP 8.1!) where instead of each case being a singleton, it's a readonly object that can carry data. PHP doesn't have those yet, unfortunately, although Ilija Tovilo and I (the authors of the Enum RFC for PHP 8.1) have it on our long-term roadmap, hopefully. (No promises.) One could model something very similar with a series of small classes with a common interface, but that's more work than I feel like putting in.

Instead, we'll do a manually-tagged union, essentially. First, we'll create an enumeration that lists the legal values for the command:

enum Command: string
{
    case Forward = 'forward';
    case Up = 'up';
    case Down = 'down';
}

We make it string-backed to make it easier to upcast from the string input. Then, we'll make a class that represents a given "step" in the file:

class Step
{
    public function __construct(
        public readonly Command $cmd,
        public readonly int $size,
    ) {}
}

This class leverages PHP 8.0's constructor property promotion and PHP 8.1's readonly properties, giving us something very close to an immutable struct as seen in languages like Rust or Go. In this case all of the command variants have the same attached data (a single int), so it's nice and simple. And with named arguments, it can even be populated in a self-documenting way.

(Side note: The next time you start using an associative array as a one-off throw-away data structure, remember this class. It took me about 14 seconds to write, but uses half as much memory as the equivalent array, is more self-documenting, and provides its own built-in error handling via the type system. Make a class. Future-you will thank you.)

Parsing

We now have our input data, and our data model, and a way to shuffle data towards our data model... next we need is the actual conversion to parse a command step string into a Step object.

Parsing is a highly complex topic; in fact, turning an arbitrary string into a meaningful data structure is one of the hardest problems in computer science. Fortunately, we don't have an arbitrary string. We have a super-trivial one line string with two elements and a space between them. That makes the parsing process nearly trivial, now that we've narrowed it down to its essence. In fact, it's only two lines:

function parse(string $line): Step
{
    [$cmd, $size] = \explode(' ', $line);
    return new Step(cmd: Command::from($cmd), size: (int)$size);
}

First, we explode the line on the space between the command and the size. Then we use those values for the constructor for Step. The command part we first upcast to an enum using the from() method. (That's why we made it a backed enum.) The size we coerce to an integer, because we have strict_types enabled (as you always should) so the constructor won't take a number-ish string. Of note, we're using the built-in \explode() function here rather than our pipe-able one from yesterday, since we don't need to compose it here. Technically we could break this up even further into two steps and compose them, but doing so wouldn't offer any value so let's not and say we didn't.

Moving forward

We're nearly done! All that's left is the reduction operation we mentioned way up top at the start of the article. Given a position and a step, we want to return the new position. That means we first need a way to define the position.

Nominally, the position is two values: The horizontal distance we've moved (x position) and the depth (y position, where down is a higher number). As noted yesterday, normally a reducing operation can only handle a single running value, and we have two. However, those two values can be easily composed into a data structure of their own!

class Position
{
    public function __construct(
        public readonly int $distance,
        public readonly int $depth,
    ) {}
}

Formally, any class like this can be considered a "product type", as it is the product of combining a distance integer and a depth integer. Informally, we're just sticking two values into a container so we can pass them both around. Semi-formally, we're defining a Position type that contains two elements. Take your pick as to which approach makes more sense to you. They're all correct.

Now we can write our move() function:

function move(Position $p, Step $next): Position
{
    return match ($next->cmd) {
        Command::Down => new Position($p->distance, $p->depth + $next->size),
        Command::Up => new Position($p->distance, $p->depth - $next->size),
        Command::Forward => new Position($p->distance + $next->size, $p->depth),
    };
}

We've modeled everything so well that moving is trivial! We match() on the command, which is guaranteed to only be one of those three values by the enum definition. Each case creates a new Position object (because position is immutable), with the depth or distance modified accordingly. Note that by naming the properties $distance and $depth instead of $x and $y we avoid any confusion about "wait, why does down make the y coordinate go up?", as the domain terms used make it clear that down means increasing the depth.

Note that there are a lot of variations on this approach that are also possible. For instance, the new Position() calls could have been turned into methods on the Position object that returned new objects themselves if desired. That would be more OOP-y, and there's absolutely nothing wrong with that. In this case that's a matter of personal preference, and that wouldn't be un-functional. It would also be acceptable to make the parse() function a static method on Step that returns a step() instance. That would still work with what we do with it later.

What would be un-functional would be to have a single mutable Position object that gets modified by the move() function and then returned. In this case, that would still work; we wouldn't have any bugs here, and it would be very slightly faster. However, that would render move() unusable in cases where the position passed to it was also meaningful for anything else, as it would then be modifying its input, too. I'd strongly recommend not doing that. Spawning new objects like this is actually really fast in PHP, so unless your profiling says that it's causing you an issue, favor "cannot break" data models (immutable) over premature optimization by default.

Bringing it all together

With that, we now have all the necessary moving parts we need.

  • We can turn a file into a sequence of lines.
  • We can turn one line into a Step.
  • We can apply an operation to every item in a sequence.
  • We can move from one position to another.
  • We can repeatedly apply an operation to an input.

Half of those operations are universal operations we can reuse anywhere (which is why they live in the Crell/fp library). We can now assemble them together with another pipe:

/** @var Position $end */
$end = pipe($inputFile,
    lines(...),
    itmap(parse(...)),
    reduce(new Position(0, 0), move(...)),
);

Read in English, that reads "Take the name of the input file, read out its lines, then map (apply) the parse() function to each line, then move() on each Step through the whole list." Which... is exactly what we're doing.

Because we're using generators, each line will be read, then parsed to an object, then applied to the position, and then the next line will be read, etc. If we wanted to be greedy instead of lazy, we could swap lines() for file_get_contents() and change itmap() to amap(). Boom, done. Either way, we end up with the final position of our submarine after reading the entire instruction file.

Changing the model

Oh no! In part 2 of the challenge, we find out that we misunderstood what the commands in the file mean. (That's why you should always read the documentation.) up and down really mean to change the "aim" of the sub up or down, and forward will then move forward in both the X and Y (distance and depth) directions. That does make a bit more sense, but it means our data model is wrong. Now what?

Fortunately, the work we put into separating out the different bits pays off here. Most of the previous code doesn't change. In fact, only two things change: the Position class and the move() function.

Position now needs to track a third integer, aim, where a higher value means pointing further down. Simple enough:

class Position
{
    public function __construct(
        public readonly int $distance,
        public readonly int $depth,
        public readonly int $aim,
    ) {}
}

Then move() changes to interpret each Step differently, but still do the same basic thing. We could leave it as is and just add another constructor argument to each arm of the match(), but that gets a bit verbose and hard to read quickly so I'm going to show off another technique, highlighted by Brent from Spatie. We said before that we could put methods on the Position class to return new versions of it with updated values. The generic name for that pattern is "with-er" methods, as the generic method name is withX() (similar to getX() or setX()). PHP 8.0 allows us to have a named-argument-only generic version of that pattern:

class Position
{
    public function __construct(
        public readonly int $distance,
        public readonly int $depth,
        public readonly int $aim,
    ) {}

    public function with(...$values): static
    {
        $clone = (new \ReflectionClass(static::class))->newInstanceWithoutConstructor();

        foreach ($this as $field => $value) {
            $value = array_key_exists($field, $values) ? $values[$field] : $value;
            $clone->$field = $value;
        }

        return $clone;
    }
}

What the with() method does is create a new object of the same class, bypassing the constructor, and then populate it by merging whatever arguments were passed to with() with the values in the original object. It is not general solution, as it ignores visibility modifiers, but it does provide a nice side-step of readonly properties when cloning. A built-in visibility-aware version would be a very welcome addition to PHP in future versions, but in our case we don't care about visibility so it works.

We can now modify move() like so:

function move(Position $p, Step $next): Position
{
    return match ($next->cmd) {
        Command::Down => $p->with(aim: $p->aim + $next->size),
        Command::Up => $p->with(aim: $p->aim - $next->size),
        Command::Forward => $p->with(
            distance: $p->distance + $next->size,
            depth: $p->depth + $p->aim * $next->size
        ),
    };
}

Each command now calls with(), with the properties to update and their new value. Up and Down are single-change, while Forward changes two values (moving forward and up/down depending on the angle of our aim). For readability I split the last one to multiple lines, but that's just for aesthetics.

And that's it! We were able to completely change the meaning of the data model with one additional property, one updated function, and one optional helper for conciseness. The only change to anything else is that creating the initial position needs to specify a starting aim of 0, but that's also easily solved by giving the constructor arguments default values. "Six of one, half a dozen of the other" as my father would say, so take your pick.

As before, the full code is available on GitHub. See you tomorrow.


Want to know more about functional programming and PHP? Read the whole book on the topic: Thinking Functionally in PHP.


Thinking Functionally in PHP

Sort:  

Your content has been voted as a part of Encouragement program. Keep up the good work!

Use Ecency daily to boost your growth on platform!

Support Ecency
Vote for new Proposal
Delegate HP and earn more

Congratulations @crell! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s):

You received more than 4500 upvotes.
Your next target is to reach 4750 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Feedback from the December 1st Hive Power Up Day
Hive Power Up Month Challenge - Winners List