On the use of enums

in PHP5 months ago

Enumerations are a feature of many programming languages. However, we should perhaps say "many languages have a feature called enumerations." What that means in practice has a lot of language-specific semantic nuance, and trying to use the feature of one language as though it were the feature of another language can lead to all kinds of broken code. (That's true of most language features.)

So how are Enumerations best used in PHP, specifically? Let's have a look at some examples, and see where they should, and shouldn't, be used.

Enumerating enumerations

Broadly speaking, enumerations in different languages fall into three categories:

  1. Syntax sugar over constants
  2. A restricted-value type unto itself (usually built on top of objects)
  3. Parameterized algebraic data types

There is certainly variation and squishiness within those categories and some languages weirdly end up with multiple, but I find them to be reasonably descriptive categories.

The important distinction is that trying to use one category as though it were another is guaranteed to cause problems. For instance, in C, an enumeration is just a fancy kind of integer. It effectively creates global constants with integer values, and then you can use those where you'd expect an integer and vice versa. They compile away to integers before anything executes. If you try to rely on a C enum giving you a separate "space" of values (the way a string and an integer are separate "spaces"), you're very likely to write code that you expect to save you from errors but doesn't.

Conversely, if you try to use an enum in Rust as though it were just a string or integer with funny syntax, you will get nothing but compile errors.

PHP Enumerations were designed very specifically to be category 2, with the intent to expand them to category 3. (As of this writing they haven't yet, but hopefully will be in the not too distant future.) That was a conscious and deliberate decision, and impacts how enums in PHP should, and shouldn't, be used.

A Type of Enum

Although PHP has "backed enums," which have an associated string or integer primitive, they are not strings and are not integers. They are a logical type unto themselves. As such, their definition should be part of the domain model of the application. Their purpose is to define the problem space in such a way that "invalid states become unrepresentable."

For instance, the whole point of typing a parameter as an integer is to say "this is an integer. Passing a string here is logically nonsensical and the language itself will stop you from doing so. As is passing a Request object in here. That just makes no sense. Don't do that." The code in the function may then safely assume that the parameter is an integer, and waste no code on other possibilities. Similarly, other languages have the ability to type a variable or parameter as an "unsigned int" (meaning guaranteed 0 or positive). That means it is syntactically illogical and invalid to pass -5 to such a function. The design of the code itself makes it clear that is nonsensical, and so it becomes an error you cannot make.

Enumerations take that a step further by allowing you to define an entirely new type space of values. Just as a Request object is not an integer, neither is a Direction enumeration with values Up and Down an integer. If a function takes an integer as a parameter, and you try to pass it a Request, both the code and the developer are going to just look at you funny. The exact same concept applies if you try to pass a Direction enum to an integer parameter.

Enums in PHP are not fancy integers or fancy strings. They are their own type, just as much as a Request, Product, or Formatter are their own type.

Back up types

So why then do BackedEnums exist? There's really only one reason: Serialization.

For scalar types, there's generally a natural way to represent a value outside of PHP. An integer becomes a sequence of digits. A string becomes a sequence of characters. Etc. For more robust types, that doesn't exist. Translating a Request or a Product into a format that can be stored in a database or shown in HTML or sent over the wire in a JSON or XML response is not a trivial action, and there's a bazillion ways of doing so. (That's why there are a bazillion libraries for doing those things.) That's because those are different types, a different universe, that doesn't have a natural translation into the universe of digits or characters.

Enums also have no natural translation into digits or characters, so cannot be trivially translated outside of PHP's memory space any more than Request or Product. The typical way of handling that for an arbitrary type is either an external rendering mechanism (like a template engine, formatter, or something like that) or some one-off method. There's pros and cons to both, which is out of scope for now, but for something as simple as an enum a one-off method is sufficient and less work.

That means a convention of a method named value() on enums (did I mention enums have methods, just like other custom types?) to represent its preferred serialized form would get the job done just fine. That would work. If BackedEnum as an interface just mandated the presence of a value() method and a from() static method, that would be equivalent.

What backed enums actually do is simply that, but with a bit more automation to minimize typing.

These things are not the same

The way to understand a backed enum is not as "this Direction::Up enum value is also a string" but as "this Direction::Up enum value should serialize to this more basic, less-descriptive value".

That's subtly different, but in a very important way. When translating from one type space to another, information may be lost. Consider:

enum Apple: string
{
   case Macintosh = 'M';
   case RedDelicious = 'RD';
   case Honeycrisp = 'H';
}

enum Orange: string
{
   case CaraCara = 'CC';
   case Navel = 'N';
   case Mandarin = 'M';
}

It should be self-evident that Apple::Macintosh !== Orange::Mandarin. That is quite literally comparing apples to oranges. However, both would serialize to "M". Trying to compare them as strings would erroneously indicate that they're the same, when they're simply not even part of the same universe.

In a sense, the backing value is most akin to a Product type's ID property. It's an identifier that is unique within the space of that type to retrieve that value, either from a database or a list of referenced objects or whatever. But the Product with ID 5 and the User with ID 5 are not the same, nor would anyone mistake them for the same. And if you tried to pass the Product with ID 5 to a parameter that expected an integer, there is no logical reason why that should work. That can, should, and would generate an error.

Don't use enums

This understanding of Enums as their own "thing" rather than as an over-engineered named string or named integer is important, because that affects when you should, and when you shouldn't, use an enum.

An enum represents a specific, domain-sensitive, small finite set of values. (Technically an enum can have as many values as you feel like typing, but compared to the number of legal ints or strings in the language it's extremely small.) A string represents a series of characters that may or may not have semantic meaning. These are very different things.

There's nothing wrong with strings, of course. Many APIs use them as identifiers. That's fine, when the desire is for a potentially infinite number of identifiers, or an at-code-time-undefined set of identifiers. A common example here is user roles.

Many systems have user roles as a form of access control. A task requires a given role, a user has some set of roles, and a user can perform that task only if the required role is one that they have.

function check_role(string $neededRole, User $user): bool
{
   return in_array($neededRole, $user->roles, true);
}

Nothing exciting here. But crucially, this design allows the system to use any arbitrary string as a role name. Presumably it's a framework that lets each application define its own user roles, which is a perfectly good design. But it does mean the value-space of roles is infinite.

It also means, therefore, that passing an enum value to it is... wrong. Just fundamentally apples to oranges wrong. It makes no more logical sense than passing a Product object as the first argument. They're fundamentally different things.

"Ah!" some will cry, "but I want to have easily auto-completed special values that are less prone to typos than strings!" That's a good attitude to have! The more precision you can give things, the better. But that doesn't mean enums are always the right tool for precision.

In this case, there's three possible ways to improve the precision of check_role() to avoid users passing a literally infinite number of possible strings to it. Two require that you control check_role(), and one does not.

All-in on enums

If you control check_role(), it's in your own code, then one option is to simply change it from a string to an enum. Define an enum for the roles in your application:

enum Role
{
   case Admin;
   case Editor;
   case User;
   case AnonymousCoward;
}

And then update the function to use it:

function check_role(Role $neededRole, User $user): bool
{
   // ...
}

The logic inside check_role() depends on whether your system makes roles exclusive or not (viz, can a user have more than one). Either approach can be appropriate depending on the application. Now, passing any value that's not one of those four is guaranteed to error, and get called out in your IDE.

This is the correct approach when you want to define that, within the universe of your application, the universe of roles is limited to those four values. The problem domain itself only allows those four, and that's by design, and that's the behavior you want.

Might Role enum cases also want to have other methods? Maybe. It's your domain model, you tell me. But the option is there.

Open type

But what if you want roles to be code-defined, but not a fixed list in one library? You do want a user-extensible list of roles, where "user" means "developers using my library?" Well, then enums are not the tool you want. The tool you want, assuming you don't want to just use strings, is an interface or abstract class.

It's heresy in some circles, but marker interfaces and empty classes are not always a bad thing! The access control library provider can instead do this:

interface Role {}

function check_role(Role $neededRole, User $user): bool
{
   // ...
}

And a user of the library can explicitly define a new role like this:

class Admin implements Role {}
class Editor implements Role {}
class User implements Role {}
class AnonymousCoward implements Role {}

Now, you get code-time typechecked errors if you try to use an incorrect role. Unlike with enums, you can grow the list of allowed values at runtime. Your IDE can even get you a list of all possible roles by scanning for any class that implements the Role interface. This is a useful middle ground approach.

If you want to ensure that there's only a single instance of each role object so that === comparisons work, you can toss a simple generic singleton implementation into each class, either as a trait (shown here) or by using an abstract class Role instead of an interface:

trait Unique
{
   private static readonly self $i;

   private function __construct() {}
  
   public static function i(): self
   {
       return self::$i ??= new self();
   }
}

class Admin implements Role { use Unique; }

$allowed = check_role(Admin:i(), $user);

Might Role objects also want to have other methods? Maybe. It's your domain model, you tell me. But the option is there.

Named strings

The third option is the only option available in the case that you do not control the check_role() function. In this case, we are sticking to the universe of strings as our legal values. As far as check_role() is concerned, any string is potentially a legal value.

As far as our particular use of it is concerned, there's only four values that we want to be legal in our application. They're still four strings, but we want at least a little extra help from our IDE. That's fine, and a good thing. But what we want here isn't an enumeration of roles to pass to check_role(). check_role() doesn't want an enumeration, it wants a string. So what we want to give it is a string, which we've given some external naming helper to in order to avoid "magic values."

That's what constants are for: Avoiding "magic values."

final class Role
{
   public const Admin = 'admin';
   public const Editor = 'editor;
   public const User = 'user';
   public const AnonymousCoward = 'anonymous';
}

$allowed = check_role(Role::Editor, $user);

(A Role interface would work equally well here, but let's have some variety.)

This approach still respects that check_role() wants a string. We have to give it what it wants. But our calls to it can now typo-check that we didn't go all web 2.0-y and type "Editr" by mistake. It does require some discipline to use the constants and not just type out 'editor' directly, but that's a factor of the string-based design of check_role(). It's a deliberate design decision.

Importantly, what's different here is that we are not defining a new type space the way an enum or class/interface would. We're sticking to the type space of strings, because that's what check_role() says to do. That means there is no translation layer where information can get lost.

I want to be clear here, as one of the co-authors of PHP's enum implementation: This is a perfectly good approach, and is not a wrong or bad way to do it. Enums are not always the answer. There are many cases where they are a good answer, and the right tool, but not always. In this case, a bunch o' constants is the superior solution, unless modifying check_roles() itself is on the table (in which case, going all in on enums is probably better).

Note that, technically, it would also be possible to use a backed enum and access its serialized value directly:

enum Role: string
{
   case Admin = 'admin';
   case Editor = 'editor';
   case User = 'user';
   case AnonymousCoward = 'anonymous';
}

$allowed = check_role(Role::Editor->value, $user);

That does work. However, it offers no meaningful safety benefit over the class with constants. check_role() will still only accept a string, and accept any string, so it does no more to prevent you from writing check_role('space_alien', $user) than the constant version does. If anything, it gives you a false sense of security if someone assumes "oh, this is an enum, that means I know all possible values," when that's not actually the case. Both constants and enums here only give you a list of suggested values; check_role() controls what the possible values are, and it says any string.

Separate bounded contexts

That said, there are potentially cases where making an enum for one library makes sense, even if it translates to a string or integer in some other library. In this case, you may want both the forced-finite-ness of enums as well as other metadata, via methods. For instance, in your particular application you may know there's only four roles, So your User objects have a role property that is of the type Role.

enum Role: string
{
  case Admin = 'admin';
  case Editor = 'editor';
  case User = 'user';
  case AnonymousCoward = 'anonymous';
 
  public function description(): string
  {
      return match($this) {
          self::Admin => 'The owners of the site',
          self::Editor => 'They can edit content, but not people',
          self::User => 'The common peasants',
          default => 'Wait, who are you again?',
      };
  }
}

class User
{
   private Role $role = Role::AnonymousCoward;
  
   public function setRole(Role $new): void
   {
       $this->role = $role;   
   }
}

However, that's true only within your own problem space, within your own domain. A different library has its own bounded context with its own data model, and thus you need to translate from one bounded context to the next.

And that's OK. Translation at a bounded context boundary is normal and expected. We do it all the time. The fact we're using enums here doesn't change that at all.

Migrating to enums

What if modifying check_roles() is on the table, though? Let's move on from access control to another use case: sorting. It's super-common for query builders and similar tools to have a parameter to indicate that values should be sorted ascending or descending. Typically, that's done through a parameter that takes either the strings "Asc" or "Desc", as that's the terminology used by SQL's ORDER BY clause. But a function parameter of string doesn't lock down the inputs to just those two values, and you therefore have to either manually check that a passed in value is one of those two (or various capitalization variants) or punt the problem to SQL and let it syntax-error there. This is a bad approach.

Since the problem space has precisely two legal values, period, ever and always amen, this is a great use case for a serializable enum:

// These serialized value equivalents
// can be safely concatenated into an SQL string.
enum Order: string
{
   case Asc = 'ASC';
   case Desc = 'DESC';
}

class Query
{
   public function orderBy(string $field, Order $order = Order::Asc) {}
}

However, there's an awful lot of code out there already that has string $order = 'Asc' for this case. If it wants to convert that to using enums, that's an API change.

Which... is correct. A function changing its type expectation from the universe of strings to the universe of Order is indeed a type change, just as much as changing from the universe of integers to the universe of Product is a type change. But as of PHP 8.0, there's a straightforward way to handle that type change: Union types.

class Query
{
   public function orderBy(string $field, Order|string $order = Order::Asc)
   {
       if (is_string($order)) {
           $order = Order::from(strtotupper($order));
       }
   }
}

This method now takes either a string or an Order, and if it's a string it up-casts it to an Order. If the string is not one of the allowed values, Order::from() will throw an error. The rest of the method can now continue as normal, safe in the knowledge that $order is in the Order type space, which has only two possible values ever, and thus it need not care about any other values, because there are no other possible values. However, existing code that calls orderBy() with the string "asc" will still continue to work fine.

At some point in the future, the string option can be removed as a breaking change to complete the migration. How far in the future depends on a variety of logistical factors, and could range from days to years depending on circumstances that have nothing to do with code and more to do with support contracts.

Note that the union type would also be a breaking change for any classes that extend Query. But that's again no different than changing from an integer to Product, or any other type change. These migration strategies are well known, as are their pitfalls. There's nothing enum-specific about them.

Conclusion

Enums are a great addition to PHP, and have a huge potential to help improve codebases by defining custom, finite spaces for new domain-specific values. I cannot stress enough how powerful that is. But I also cannot stress enough how important it is to not overuse the shiny new hammer. If the type space of a given problem is not finite, then enums are not the right tool and using them will only make your code worse, not better.

There are times when a type space seems finite to the consumer of an API, in their particular use case, but from a library's broader point of view it is conceptually infinite. User roles is one example we saw above; authors of a post is another. A given blog may have only three authors on it in practice, but to the CMS, there could just as easily be millions. That means the type space of Author is millions, not three. We could come up with an infinite list of similar cases, I'm sure.

So please, use enums to improve your code. But only use them for problem spaces that are truly finite, discrete, and known in advance. Your code will thank you.

Sort:  

Thanks for your contribution to the STEMsocial community. Feel free to join us on discord to get to know the rest of us!

Please consider delegating to the @stemsocial account (85% of the curation rewards are returned).

You may also include @stemsocial as a beneficiary of the rewards of this post to get a stronger support.