Advent of Functional PHP: Review

in PHP11 months ago (edited)

Over the last few weeks, I've been following Advent of Code 2021, using Functional PHP as an approach. It's been a fun and educational process, at least for me and apparently for a few other people, at least given how popular the articles have been.

For reference, the full list of articles in this series is here:

  • Day 1: Function composition, pipes, and partial application.
  • Day 2: Map, reduce, generators, immutable objects, and with-er methods.
  • Day 3: Recursion, memoization, and bits.
  • Day 4: First, head, array flattening, and handling state.
  • Day 5: Zip and nested pipes.
  • Day 6: Efficiency and dealing with infinite streams.
  • Day 7: Functional means and medians
  • Day 8: Encoding, decoding, and the value of first-class function thinking.
  • Day 9: More fun with recursion.
  • Day 10: Reduction and recursion, and how to swap between them.

After letting those sit for a few days, I want to use those examples as a way to evaluate how well PHP handles functional-style programming and what improvements we could make to the language to make it even better.

Will it blend?

To answer the first question: Can you do functional-style coding in PHP? Yes, unquestionably. Is it easy and ergonomic? Kind of. There are some parts that are surprisingly easy now in PHP 8.1, and others that are still a bit clunky.

The good bits

Probably the two biggest improvements in functional-style PHP in recent years have been the introduction of arrow functions / short lambdas in PHP 7.4, and now first-class callables in PHP 8.1. Frankly, I would really have hated writing this code without those two language features.

Both of those we can thank the ever-popular Nikita Popov for, although some credit also goes to Joe Watkins, with whom I collaborated on an attempt at full partial function application support in PHP. That didn't pass, but it did inspire Nikita to write the first-class-callables RFC so it was a... partial win. (Pun very much intended.)

Another useful feature was match(), which was introduced in PHP 8.0 courtesy of Ilija Tovilo. I've found a lot of conditionals can be recast as a match() and doing so results in cleaner code, in no small part because it forces complex branches to be factored out to their own functions. That's an affordance that coaxes me toward better-factored code, which is a good language feature to have.

The new readonly properties in PHP 8.1, also from Nikita, make functional-style product types a lot nicer to work with. There's still a lot of syntactic gaps, though (noted below).

And of course generators, which were introduced to PHP 5.5 by a much younger Nikita Popov, cropped up in many problems. Even when you don't need them for memory management (to deal with massively huge lists), it's still impressive how much they can simplify certain algorithms.

The not so nice bits

There's still places where PHP falls short of where it could be, and frankly should be, from a functional point of view.

Around half of the short-lambdas we wrote for these challenges (excluding the ones inside our utility functions) were simple wrappers around other functions in order to reduce them to single-argument functions. While wrapping it in a short lambda manually certainly works, it's more work to write and more work to read than is ideal. That would have been resolved had the Partial Function Application RFC passed, instead of being narrowly declined. It is effectively syntactic sugar, but it's sugar that matters.

Similarly, while we can concatenate functions together via other higher-order functions, such as our pipe() utility, it's more clumsy than if we had a native syntax for it. The same is true for pure composition, if to a lesser extent.

Producing functions that are meant to be piped is also a bit of a hassle. While many of those use cases would be eliminated by PFA, not all of them would. Nearly all of our utility functions had the same basic syntax pattern:

function something($arg1, $arg2): callable
{
    return function($one_arg) use($arg1, $arg2): someType {
        // Something here.
    };
}

That's a lot of boilerplate for something that function composition means we'd be doing a lot of. It would be really nice if we could reduce that boilerplate, and maybe even get some small performance boost out of it.

We also ran into a tricky problem in Day 9's puzzle dealing with arrays. For all PHP developers seem to love their pseudo-arrays, they're actually remarkably badly-designed. They can have string or integer keys, but numeric strings and floats will get silently converted into integers. They cannot use any other value as a key.

If you have an object, you can use SplObjectStorage, which lets you use an object as a key. However, you can only use an object as a key, and multiple instances of equivalent objects are still separate key entries. There is no way to use another array as a key; that may seem like an odd thing to do, but it would actually have made Day 9's code quite a bit cleaner. And since SplObjectStorage isn't actually an array, it (like all Iterator/ArrayAccess objects) cannot be used with any of the native array utilities PHP provides. It would be compatible with some of the custom ones we built, but it's sad that those have to be in user space.

Finally, the new keyword is not pipe-able. That can be worked around with static methods, but it's annoying. The lack of a native "with-er` style operation is also a problem. While it can be emulated with a method in a trait, as we did here, that also provides a way to bypass visibility rules. That can never be a full solution.

Improvements to add

Based on that experience, then, the following are the language improvements that I believe would offer the biggest benefit for PHP as a functional-style language. I have them listed in approximately the "most bang for the buck" order. I am happy to collaborate with anyone on them, but my core-editing ability is still barely even paltry at this point.

A |> operator

This should come as no surprise, but I believe far and away the most important addition to PHP would be a native pipe operator. I proposed an RFC for that in 8.1, but it did not pass, in part due to some details not having been worked out (mainly around references, PHP's black sheep). I am hopeful that if someone with more core knowledge than I have were to write it properly, it would have a better chance in a second attempt. I am also hopeful that those voting on RFCs have at least glanced at this series and seen just how powerful function composition is.

As for why this cannot be done just as well in user space, there's two reasons: One, it is syntactically much clunkier. It requires a lot more parentheses, and parenthesis matching, and remembering your commas, and other syntactic flotsam to make work in user space. Two, performance. A user space pipe implementation, in my benchmarks, is oddly a bit more expensive than I'd expect. Most likely that's because of the extra overhead of more function calls to make it work, which in PHP are costlier than in most compiled languages. A native operator would undoubtedly have more opportunities for performance optimizations to make it faster than a user space version could ever be.

A composition operator

When I first attempted the pipe RFC, I stated that I didn't think a composition operator (which would produce a new function but not actually call it yet) was necessary, and manually wrapping it in a short lambda would be sufficient. Having gone through this exercise, I now believe I was incorrect and a dedicated function composition operator would be useful.

Why? Because several times, as we saw, it was better and more performant to concatenate two or more functions and then map over them than to map over each one individually. But it would be clunkier (as we saw) to introduce a short lambda with all of its extra syntax into a pipe chain just for that. I am flexible on what the symbol is; different existing languages use different options. If we went with the F# syntax of >>, then the final code for day 8 would look something like this:

$results = $inputFile
   |> lines(...),
   |> amap(parseLine(...) >> deriveMapping(...) >> decode(...) >> digitsToNumber(...)),
   |> array_sum(...);

Which I think is pretty nice.

A composition operator would be more of a nice-to-have compared to the pipe operator, but once pipe is included the extra cost of including a composition operator should be pretty small, giving it a good ROI.

Easier long-lambdas

One way of reducing the boilerplate for making pipe-able functions is to reduce the boilerplate of multi-line lambdas. An easy way to do that is to enable auto-capture for them, rather than requiring the developer to re-enumerate each variable individually.

There is, in fact, an RFC to do just that, which was written for 8.1 but was pulled from a vote pending more verification that it didn't introduce unexpected memory leak issues. That means this RFC is already written, and just needs someone to finish off verifying that it doesn't introduce any memory leaks.

There was some debate on whether it would lead to "sloppy code," but like most such arguments I think they are over-blown. When writing functional-style code, writing functions that return functions happens several times a day. Re-enumerating variables to capture manually is completely unnecessary pain and hassle that can be easily removed. Most languages with closures support auto-capture, and the only one that has an issue with it is JavaScript (due largely to other issues with the language, given that no other language seems to struggle with it leading to "bad code" as far as I am aware).

This is probably the lowest hanging fruit on this list, so if someone wants to pick this up and run with it, check the issue for what verification is still left and let's get it to a vote.

It wouldn't completely eliminate the boilerplate of higher order functions, but it would reduce it some. Its twin RFC, short named functions, would reduce it even more but it seems the Internals folks are not ready for that yet, based on the vote.

Had both RFCs passed (or were they to pass in the future), the boilerplate for a higher order function would be reduced to:

function something($arg1, $arg2) => fn ($one_arg): someType {
    // Something here.
};

Not perfect, but definitely nicer. I'm totally open to other approaches here, too.

clone-with

Essentially, we need the Evolvable trait that we've been using in this series baked into the syntax. Doing it in user-space is naturally error prone; I actually had to rewrite it several times over the course of this series to handl more edge cases and null values, unset values, uninitialized values, etc. As noted, it bypasses all visibility checks, which is not good. And doing it in user space is definitely more costly than done in a single C call, as in its current form it requires reflection and iteration on every single call.

One syntax that has been discussed in the past for this functionality is to expand the clone() operation. The proposal would look something like this:

$foo2 = clone($foo) with {a: $newA, b: $newB};

It's very similar to the method call we used in this series; however, as a language construct it would be able to enforce visibility rules, so calling code could be prevented from changing a value that is not visible in its scope.

On the downside, as it's not a "function" that would not play nicely with pipes. It would always need to be wrapped into another closure. That may be acceptable, but I am open to other syntax proposals that would let us skip that step.

A clone with or equivalent to support evolving readonly objects is something that was discussed when readonly was introduced, and there's general recognition that it is needed. It was (correctly, IMO) left out of the readonly RFC as being a separate topic, but it would still be a valuable addition.

Partial Function Application

This is the RFC that was just barely declined in 8.1, and we got its junior version, first-class-callables, instead. Joe Watkins did amazing work to get this RFC to a working state, and I really liked where the syntax ended up. It would have simplified a lot of code in this series, although less than pipes would have. However, because it messes with the way function calls happen it's a very complex patch, and in the end enough voters felt the ROI of that complexity wasn't worth it that it didn't pass.

Naturally I disagree, of course; I don't know if this series has shown where PFA would still be beneficial, but I think it has. I don't think Joe is interested in taking another swing at the PFA RFC at this point, but if someone else feels we could make a better argument for 8.2, I'm game to try.

Lens functions

These didn't come up in the exercises explicitly, but there were a couple of places where we had a closure that existed for the sole purpose of reading a property off of an object, or invoking a method on an object. That's because objects... really don't pipe nicely. Or rather, data on them doesn't pipe, because syntactically they just don't "fit" with the way pipes work. You need a $this on which to call ->method() or ->property, and pipes, by design, hide the variable being piped.

Making wrapper functions that make them pipeable is of course trivial, using the clunky higher-order-function syntax we saw before:

function prop(string $prop): callable
{
    return function(object $o) use($prop): mixed {
        return $o->$prop;
    };
}

function method(string $method, ...$args): callable
{
    return function(object $o) use($method, $args): mixed {
        return $o->$method(...$args);
    };
}

$vals = $foo |> bar(...) |> amap(method('beep', 'more', 'args')) |> amap(prop('baz'));

(Seriously, look at them; short-functions and autocapture would make those so much cleaner.)

However, that runs into the perennial problem that strings are not parsable, so if you have a typo in your property name or method name, it won't get caught and you'll end up with mysterious errors later on. Just moving the functions into C wouldn't actually help in this case.

What we'd need instead are "lens functions," but implemented as language constructs. Lenses are a feature of some stronger functional languages, and their category theory basis is a bit complicated, but in a practical sense they are... basically the prop() function shown here, but also combined with Evolvable.

I am not quite sure what the syntax for that could or should be to allow better error handling than a string, which is why this is fairly low on the list. if someone has a good suggestion for how to make that nicer, I would like to hear it.

Object comparison

Object equality comparison has been discussed on and off for years. The latest version is part of the operator overloading RFC, currently in development. I support the RFC generally, but the <=> part of it in particular would have come in quite handy in a few places, which I had not expected.

The main place it would have been helpful is in Day 9. Specifically, it would have enabled us to use an array of Point objects rather than an array of associative arrays with x and y keys. The main issue was that we needed to be able to do an in_array() check, which would only have worked if we checked for object identity. If instead we could implement object equality by comparing the properties of two objects, we could have used in_array() essentially as-is. Ideally we could expand that behavior to cover other array operations like array_intersect(), array_unique(), array_diff(), and their various variants.

I don't know if SplObjectStorage could be beefed up to use that instead. That could require a different approach; I'm not sure. But if it allowed the use of value objects in SplObjectStorage more readily (by allowing equivalent objects to be defined to be equal), SplObjectStorage would become a much more viable tool for mapping objects to things.

function-friendly new syntax

Finally, and this is definitely in the nice-to-have category, Constructors in PHP are syntactic oddballs in a number of ways. As noted, they cannot be piped, because they're not syntactically a function call (even though they really are). They also are not compatible with first-class callables, and cannot be chained like any other method without being wrapped in an extra set of parentheses.

That can be worked around with either a one-off closure, or with a static method that wraps the constructor. The latter can also be wrapped up into a trait, like so:

trait Newable
{
    public static function new(...$args): static
    {
        return new static(...$args);
    }
}

(I'll be adding such a trait to Crell/fp for easier reuse.) I don't know that a built-in alternative would be any better than that, frankly. Mainly it would be a benefit for consistency to have some native way to call a constructor like it was any other method. There's probably a larger discussion here, but it would make some things a bit nicer in the micro.

Let's do this

This is not a short list, of course. Nor is it a comprehensive list of every feature that would benefit PHP. However, I do think it is a reasonable "punch list" of features that would have a high ROI for PHP in terms of improving is functional-style programming story, and are all within the realm of feasibility with reasonably straightforward implementations. (Generics would be great, too, but that's a whole other ball of wax.) Given that PHP is primarily used for web requests, and web requests are intrinsically a functional problem space (they're a function from a request to a response; that's literally what any web request is), it's a natural pairing. While PHP developers may not be used to functional patterns, they are not any more intrinsically weird than OOP patterns. In fact, most of them are far less verbose and easier to mix-and-match than OOP patterns. We should be embracing them, at the language level.

I can only do so much on my own, though, especially as my Internals C/skills are still lacking. If you are an internals developer and any of these excite you, please reach out! I am happy to collaborate with (almost) anyone on any of these. (People open to spending the time mentoring are especially welcome.)

If you're just an Internals voter, but not interested in helping to implement any of these, that's OK, too. I would encourage you to consider these changes if they come up for a vote, however. They are all useful changes that do add value to the language, perhaps not along obvious lines but along lines that are becoming more obvious over time.

PHP has been making slow baby steps toward more functional-style code for a long time. It's time for the PHP community to really embrace that capability and really open up the gates for functional programming in our favorite multi-paradigm language.