Cutting through the static

in PHP5 months ago (edited)

Static methods and properties have a storied and controversial history in PHP. Some love them, some hate them, some love having something to fight about (naturally).

In practice, I find them useful in very narrow situations. They're not common, but they do exist. Today, I want to go over some guidelines on when PHP developers should, and shouldn't, use statics.

In full transparency, I will say that the views expressed here are not universal within the PHP community. They do, however, represent what I believe to be the substantial majority opinion, especially among those who are well-versed in automated testing.

What's wrong with statics?

First off, what is the problem with statics? They're a language feature, aren't they? They are, but that doesn't necessarily make them good. (Anyone else remember register_globals?) Their main downside is that they inhibit testing, because they inhibit mocking.

One of the key principles of good automated testing is to isolate small parts of the system and test them independently. That requires having key points where you can "cut" one piece of code off from another (using a mock, fake, stub, or whatever) and test one of them, without worrying about the other. The main "cut" point that PHP offers is object instances, especially object instances passed in through the constructor (aka, Dependency Injection; yes, that's all DI means: passing stuff in via the constructor). If you're working with code that does that reliably, it is generally pretty easy to test. If it does anything else, it is generally pretty hard to test.

Consider this example:

// BAD EXAMPLE
class Product
{
    public static function findOneById(int $id): self
    {
        // Some DB logic.
        $record = Database::select(/* ... */);
        
        if (!$record) {
            throw ProductNotFound::forId($id);
        }
        // More robust code than this, please.
        return new self(...$record);
    }
}

This is a simple database lookup. It's easy to call from literally anywhere in the code base using Product::findOneById(5). But... that's the problem. Actually there's several problems.

  1. If some other code calls Product::findOneById(5), that code cannot be separated from this method. There is no way to test it without also testing Product. Product cannot be mocked/faked/stubbed. Your other code will ever and always need Product.
  2. It's not just a testing issue; if you ever want to use a different version of Product::findOneById() — say because requirements have changed, you need multi-tenancy now, or whatever — you're stuck.
  3. findOneById() needs a database connection to work. But since you cannot inject values into a static method (there's no constructor), how can you get a connection to it? All you've got is static calls. So that requires some code like Database::select('Some SQL here'). That, in turn, hard-couples Product to the Database class.
  4. That, in turn, means whatever calls Product is also hard-coupled to Database, and presumably therefore to an actual database connection. You now cannot test one piece of code far-removed from the database without a for-reals database running. That's... not good.

Compare with the constructor-injected version:

readonly class ProductRepository
{
    // This is injectable, and thus trivially testable.
    public function __construct(
        private Connection $conn,
    ) {}
    
    public function findOneById(int $id): Product
    {
        $record = $this->conn->select(/* ... */);
        
        if (!$record) {
            throw ProductNotFound::forId($id);
        }
        // More robust code than this, please.
        return new Product(...$record);
    }
}

Now, we've split Product into a data object (Product) and a mapper/loader (ProductRepository). The repository is a normal service object. It requires ("depends on") a Connection object, which is passed to it. Because that's an object, not a class, we can pass anything we want to it as long as it conforms to the class type: A real database connection, a fake one, a MySQL one, a Postgres one (within reason), etc. Most notably, we can pass a mock and test ProductRepository without having to even install a database.

And that same benefit extends to the code that uses the repository: It will accept a ProductRepository constructor argument, which can similarly be the real repository or a mock. We can now test that client code without needing a real repository instance.

But isn't passing those constructor arguments around manually a lot of work? Yes, it is. Which is why no one does that anymore! Nearly all modern Dependency Injection containers support auto-wiring, whereby most (80%+) services can be auto-detected and auto-configured, so that the right constructor arguments are passed in. With Constructor Property Promotion available in all supported PHP versions, accepting a dependency via the constructor is trivial. (Prior to PHP 8.0 it was a lot more annoyingly verbose; that problem no longer exists.) The combination of auto-wiring containers and Constructor Promotion has virtually eliminated all previously-legitimate arguments against using DI. It's usually even easier than trying to set up an alternative.

But doesn't that mean it's harder to instantiate an object one-off with services? Yes. And that's good! You should rarely be doing that; making that clunky encourages you to refactor you code to not need it. If you really do need dynamic creation of a service, that's what the Factory Pattern is for. (In short, you call an object that does it for you, and it can do the wiring in a common location and do nothing else.)

Static types

So if statics suck for testing, are they ever valid to use? Yes! Statics are valid when the context they are operating within is a type, not an object instance. PHP has no meaningful way to swap out an entire type (there's some hacky ways that kind of work we'll ignore for now), so not being able to mock the type doesn't hurt anything.

In practice, the only case where the type is a relevant context is object creation. There are probably others, but this is the only one I ever really see. In the previous example, we had this line:

throw ProductNotFound::forId($id);

That is using the "named constructor" technique, using a static method. I use this approach a lot on exceptions, in fact, as it can be more self-documenting, and allows things like the error message to be incorporated into the class definition itself.

class ProductNotFound extends \InvalidArgumentException
{
    public readonly int $productId;
    public readonly array $query;
    
    public static function forId(int $id): self
    {
        $new = new self();
        $new->productId = $id;
        
        $message = 'Product %d not found.';
        $new->message = sprintf($message, $id);
        
        return $new;
    }
    
    public static function forQuery(array $query): self
    {
        $new = new self();
        $new->query = $query;
        
        $message = 'No product found for query: %s';
        $new->message = sprintf($message, implode(',', $query));
        
        return $new;
    }
}

Note here that we're offering two different named constructors; that's perfectly fine. In this case, the alternative is an inline new call in ProductRepository, which is no more or less mockable. So a static method here is fine. However, note that the static methods are both pure: They store no state (the properties are saved on the object, not the class), and do no IO.

This does mean a hard-coupling of findById() to ProductNotFound, but... that's OK. ProductNotFound is an exception, and therefore a value object. Value objects rarely if ever need to be mocked in the first place, as they can be trivially faked. Consider:

class Color
{
    private string $color;
    
    private function __construct() {}
    
    public function isPale(): bool
    {
    // ...
    }

    public static function fromRGB(int $red, int $green, int $blue): self
    {
        $new = new self();
        $new->color = '#' . dechex($red) . dechex($green) . dechex($blue);
        return $new;
    }
    
    public static function fromHex(string $color): self
    {
        $new = new self();
        $new->color = '#' . $color;
        return $new;
    }
    
    public static function fromHSV(int $hue, int $sat, int $value): self
    {
        [$r, $g, $b] = self::hsv2rgb($hue, $sat, $value);
        return self::fromRGB($r, $g, $b);
    }

    private static function hsv2rgb(int $hue, int $sat, int $val): array
    {
    // ...
    }
}

This value object represents a color. It is a value object; it doesn't really make sense to mock, any more than mocking an integer would. Just... pass a different integer, or a different Color instance. Its constructor is private, so the only way to create it is through the static named constructors. fromHex() is the main one, and the simplest. fromRGB() is also pretty straightforward, and produces the same object by different means. These are all perfectly reasonable uses of static methods, because they relate to the type Color rather than to any particular data. And note again, they're all pure functions.

fromHSV() does a little bit more, in that it has a utility method to convert HSV colors to RGB colors. (The algorithm to do so is fairly standard and easy to Duck Duck Go, hence omitted for now.) Because fromHSV() is static, the utility must also be static as well as there's no instance context to work from. This is also an acceptable use of statics; note, however, that hsv2rgb() is private. It's an internal implementation detail.

Static registration

Historically, another common use of static methods has been registration, that is, when there is some kind of plugin system in a framework and you need a runtime way to "register" some extension/instance/object/hook/whatever with the framework. In general, there's four ways that can be done.

  1. Externally from the class being registered, which we don't care about for now.
  2. A static method
  3. A static property
  4. Attributes

For example:

// Best option in modern code
#[Command(name: 'product:create')]
class CreateProductCommand implements Command
{
    // Externally mutable, which is not good.
    public static string $name = 'product:create';
    
    // More verbose, but more flexible if some logic is needed.
    public static function name(): string
    {
        return "product:create";
    }
    
    public function run(): void
    {
      // ...
    }
}

The static name() method is presumably part of the Command interface. In this case, again, the context being named isn't an object instance; it's the class/type, and thus a static method is reasonable. A method in this case is arguably overkill, as it's just returning a static value. That's why some frameworks instead have a magically named static property, like $name above. That's simpler, but comes at the cost of not being part of the interface (though hopefully that will change) and being public and mutable. Remember the rule for statics we said above: Pure and stateless! A mutable value is neither pure nor stateless. It works, but I wouldn't recommend it.

As of PHP 8, though, I'd argue attributes provide a better alternative. They can only capture fixed-at-compile-time information, but that's also true for the static method. They're a native language feature purpose-built for this kind of work, which means all the tooling knows how to parse it. It is more compact compared to multiple static metadata methods. And it logically fits the use case: Registration like this is "metadata" about the type/class. Attributes are metadata, and clearly so from the syntax as they're distinct from the runtime code. They can also be used on both classes and methods, depending on what needs to be registered, with the exact same syntax.

In the end, I wouldn't say using a static method here is bad, per se. There are just better options for modern code to use.

Functional code

What if we want the hsv2rgb() routine above to be available to more than just the Color class? It's a reasonable utility method that may have more generic use. What shall we do with it then?

Make it a function.

That's it, just a normal, plain, boring function.

// color_util.php
namespace Crell\MyApp\Colors;

function hsv2rgb(int $hue, int $sat, int $val): array
{
    // ...
}

It should be namespaced, of course, but functions support namespaces. This way it can be used anywhere, and we can unit test the function individually without any other context. That's good! But there are rules here, too:

  1. The function must be pure.
  2. The function must not do any IO, even indirectly.
  3. The function must not call a service-locator.
  4. The function must not access globals.
  5. Did I mention the function must be pure?

Why are we so strict on functional purity for functions? Because functions, like static methods, are not mockable. If your code calls a function, then all of its tests will also always call that function. There is no way around that. Your smallest "unit" of code includes the code you're testing and all functions and static methods it calls, recursively. So if your code calls a function, which calls a static method, which calls another static method, which runs an SQL query... guess what, your code now cannot be tested without a fully populated database. Welcome to testing hell.

However, if your code calls a pure function, which calls a pure function, which calls a pure function... Sure, you're running more code in your tests, but it's all still CPU cycles in the same process. There's no logic reason to mock them out; there may be a performance reason, but not a logic reason. It doesn't really hurt the code's testability.

Does that mean you should make all of your code pure floating functions? No! You still want to mock things, and you still need to have some context and input somewhere in your application. (It's not a particularly useful application otherwise.) Most of your application should still live in well-designed objects. But when you have a stand-alone, pure, utility routine that doesn't fit anywhere else... A function is fine.

Autoloading

There's an old PHP habit from the PHP 5.2 and earlier days of using static methods instead of functions as a sort of cheap knock-off namespace. That somewhat made sense before PHP had namespaces, but we've had namespaces for 14 years now. We don't need cheap knock-off namespaces when we have real namespaces that work just fine for functions, thank you very much.

// BAD EXAMPLE
class ColorUtils
{
    public static function hsv2rgb(int $hue, int $sat, int $val): array
    {
        // ...
    }
}

The other purported advantage of static utility classes is autoloading. PHP doesn't yet support function autoloading, and while it's been discussed many times there are some technical implementation challenges that make it harder than it sounds.

But... does that matter?

If you're using Composer (and if you're not, why?), then you can use the files autoload directive in composer.json:

{
    "autoload": {
        "psr-4": {
          "App\\": "app/"
        },
        "files": [
          "app/utilities/color_util.php"
        ]
    }
}

Now, Composer will automatically require() color_util.php on every page load. Boom, the function is loaded and you can just use it.

But doesn't that use up a lot of resources to load all that code? Not really! It used to, back before PHP 5.5. But since PHP 5.5, we've had an always-on opcache that stores loaded code in shared memory and just relinks it each time the file is require()ed. Since we're dealing with functions, that relinking is basically zero cost. So while it will marginally increase the shared memory baseline, it has no meaningful effect on the per-process memory.

If you want to go even further, as of PHP 7.4 you can use preloading to pull the code into memory once at server-boot and never think about it again. That may or may not have a measurable performance impact, so do your own benchmarks.

So in practice, the lack of function autoloading support is... not a big issue. If you have enough functions that it becomes an issue, seriously consider if they shouldn't be methods closer to the code that actually uses them in the first place.

Edit: Nikita Popov wrote something along similar lines over a decade ago.

Flip a coin

There's one final situation to consider when discussing static methods. That's when you have a method that is itself pure, and doesn't need a $this reference, but gets used from object code. For example:

class ProductLookup
{
    public function __construct(private Connection $conn) {}

    public function findProduct(string $id): Product
    {
        [$deptId, $productId] = $this->splitId($id);
        
        $this->conn->query("Select * Fom products where department=? AND pid=?", $deptId, $productId);
        // ...
    }
    
    private function splitId(string $id): array
    {
        return explode('-', $id);
    }
}

In this (trivial) example, splitId() is pure. It has no context, it has no $this, it has no dependencies. (These are all good things.) That means it would work effectively the same as a method, as a static method, or even as a function. You're not really going to want to mock it (nor would you be able to) in any case. So which should you use?

My argument is that you should default to an object method (as shown above), unless there's a compelling reason to do otherwise.

  1. Object methods can call static methods, but static methods cannot call object methods. (Static methods are "colored", much like Javascript Async.) So using an object method gives you more flexibility as the code evolves.
  2. Since most of the time you want to be using object methods anyway, it's a good habit to get into just using object methods unless there's a very good reason to do otherwise. Keep that muscle memory going.
  3. The odds of it actually being useful elsewhere as a general utility and being large enough that it's worth factoring out to a common utility at all are low, and you don't know that initially. If you decide later that it makes more sense to split off to a stand-alone function, that's future-you's job.
  4. When dealing with static methods that call static methods mixed with inheritance, there's an extra layer of complexity about self vs static that you have to worry about. That confusion doesn't exist with object methods.

I have seen people argue for static by default in these cases, on the grounds of "if you don't need a $this, make it static." That's a defensible position, but I disagree with it for the reasons above. I still firmly hold that you should avoid statics in most cases, which means if it's a toss up, stick with object methods.

Conclusion

In summary, when should you use static methods?

  1. If the relevant context is a type, not instance, and is a pure function, use a static method. Named constructors are the most common instance of that.
  2. If it's a general-purpose utility, with no context beyond its arguments, large enough to be worth centralizing instead of repeating, and also a pure function, use a stand-alone function.
  3. Else, use an object method. (And make most of those pure functions, too.)

This will give you the most maintainable, most testable outcome possible. And that's what we're really after, isn't it?