How the Ethereum Hard Fork Can Fail

The Ethereum fork will take place in a matter of days. I recently skimmed through it, so here are the kinds of issues that I worry about with regard to the hard fork to come, as well as half-baked ideas related to the decentralized administration of the hard fork. This is a brain dump of sorts, so treat it as such. Not every potential problem turns into a real problem. I will scientifically lay everything I worry about on the table. If we can methodically prove to ourselves that they are not worth worrying about, then we are done. However, if you have a large open ETH position and cannot handle dispassionate discussion about possible problems, it's best to look elsewhere. Ok, so let's delve into possible problems:

At the very highest level, the hard fork comprises the hard fork policy (which determines how people get remunerated), the client code (which implements the ethereum protocol in multiple different languages for different clients), and the refund contract (which takes over the ether balance of the old DAO, but leaves the old DAO intact).
The community itself determined the hard fork policy via a discussion process. I will take this as a given.
The intent of the hard fork policy sounds quite reasonable to me. I will regard all discussion of mechanisms in a policy document as suggestions -- the masses care about the outcome, not the means.
The hard fork policy leaves unaddressed the issue of what happens to the coins that are lost, abandoned or left unallocated in the trustee multisig contract. Laying out the charter of the trustees, so the expectations from them are specified down to every wei, is necessary.
The client code changes (geth and parity) seem very straightforward. This is one of the big advantages of a clean hard fork. The core implementation is easy to check, and the geth code I skimmed seems to have the right general shape.
That brings us to the new refund contract. It has three components that I want to treat separately: the refund engine, the token mechanism, and the enumeration and classification of affected child DAOs.
I have not looked into the code for enumerating and classifying the child DAOs at all. This work involves a modest amount of blockchain forensics, and some manual verification of tools for parsing the ethereum blockchain. I assume, based on second-hand reports of people who have checked its output, that this code works as intended.
The refund engine, in isolation from the token mechanism used to keep track of account balances, looks generally OK to me, but it is not written in a manner that I would consider best practice or defense in depth.

Here's the current code:

// Deployed on mainnet at 0xbf4ed7b27f1d666546e30d74d50d173d20bca754

contract DAO {
    function balanceOf(address addr) returns (uint);
    function transferFrom(address from, address to, uint balance) returns (bool);
    uint public totalSupply;
}

contract WithdrawDAO {
    DAO constant public mainDAO = DAO(0xbb9bc244d798123fde783fcc1c72d3bb8c189413);
    address public trustee = 0xda4a4626d3e16e094de3225a751aab7128e96526;

    function withdraw(){
        uint balance = mainDAO.balanceOf(msg.sender);

        if (!mainDAO.transferFrom(msg.sender, this, balance) || !msg.sender.send(balance))
            throw;
    }

    function trusteeWithdraw() {
        trustee.send((this.balance + mainDAO.balanceOf(this)) - mainDAO.totalSupply());
    }
}

In particular, from least to worst:

I don't understand why the trustee address is not marked "constant". I realize that there is no code that can change the trustee address. But then why is the DAO address marked constant? Inconsistencies like this erode trust and telegraph that we have learned nothing from the DAO disaster. Document all intent and assumptions in the code, that's the right place to do it.
The trusteeWithdraw() function seems to be intended solely for the trustee multisig's use. So, it should have a function modifier that rejects calls unless they come from the trustee address. The lack of such modifiers is a cavalier coding practice that reminded me of the time when I sold my motorbike and bought a used car in grad school. I noticed that the air intake for the cabin was right behind the engine block, in a spot where any oil leak would turn into noxious fumes. I thought "wow, these German engineers are so good, so amazingly confident, that they purposefully put the intake in that spot, to show the world how convinced they are that the gaskets will never leak." So it was kind of funny (not) when my engine leaked oil and I ended up breathing noxious fumes until I had to give the car to a charity for the blind. Long story short, there is no room for this kind of cavalier behavior in a post-DAO world. Put the modifier there already.
The critical weakness of the refund engine is its dependence on the inviolability of the DAO's token management, discussed next.

The current implementation relies on the integrity of the old DAO's token management. I find this quite dangerous. Philosophically, if we were unrolling the hard fork solely because "a hacker took advantage of a small reentrancy bug," then it would make sense to rely on the old token accounting -- after all, the old DAO would be perfectly fine, save for one minor glitch, and we'd be fixing just that one minor glitch. But that's not true and that's not why I advocated a hard fork. In fact, this kind of case-by-case, narrow thinking is precisely what got us into trouble. I believe the hard fork is called for because the old DAO is bug-ridden at multiple levels. Consequently, my starting assumption, that we treat the entirety of the old DAO as untrustworthy, motivates a different strategy.
As a result, if I were devising the hard fork, I would freeze the DAO token balances at a certain block, and then build a list of refundable addresses based on the frozen balances. DAO tokens traded after the freeze point would have 0 redeemable value from the refund contract.
I made this suggestion to the folks working on the hard fork, but they favored the current approach instead. To be fair, the current approach is simple, requires no trust in a party who will perform the enumeration (it requires trust in the DAO token code instead, and a trustee for enumerating childDAOs, a simpler task), and permits trading of tokens until the final refund. But it is open to attacks that can manufacture DAO tokens. My suggestion does not rely on the old DAO at all, but its simplest implementation would probably rely on a trustee who will perform the enumeration.
I did not fight hard to get the hard fork folks adopt my approach, for three reasons: first, I do not know of an exploit that can create tokens in the main DAO and exploit this refund contract. Second, the practical form of my suggestion ends up relying on a trusted party to compute the balances -- and it is difficult to find such a party that will take on the responsibility for no compensation. Finally, if a high-profile Ethereum Dapp using basic patterns is unable to handle simple token creation and transfer, then a wakeup call is necessary anyway -- a failed hard fork would be quite dire, but survivable, maybe. Bonus reason: I do this stuff just because it's fun, in between real research. I said it once to the right people -- repeating it would stop being fun.
If we are going to instead push ahead with the current fork strategy that relies on The DAO's accounting of its tokens, it would be prudent to construct a proof that the token management of the current DAO is correct, and cannot be abused to issue spurious tokens.
There is, approximately, $115M at stake. This immediately creates a lucrative bug bounty, albeit for illegal gains: everyone has an interest in looking carefully at this code to find its flaws. Yet no one has any incentive to reveal what they find. There is no one who can offer $115M or even $11.5M to a hacker in monetary compensation. This should be quite worrying -- a bright hacker who identifies a flaw might choose to exercise the flaw and collect the cash, instead of letting people know. (Incidentally, this is true for Bitcoin as well. A coin without a substantial bug bounty is a vulnerable coin).
Regardless, it is possible to compete with even $115M in monetary compensation: people are generally nice and they value intangible assets much more than money. In particular, if the community commits to granting "uber-hero" status to anyone who identifies a bug in the hard fork, it might entice an idealistic person to avoid the hassle of laundering illegal gains. For instance, part of the uber-hero treatment would be an invite to Devcon2 to give a keynote. The nice thing about this is that it's almost free, and it also helps differentiate the ether community from others whose first reaction to any bug report is to deny and attack the researchers. And it's the only way of competing with a large pot of coins.
If the trustees do their jobs correctly, any bugs that affect the token accounting in the child DAOs ought to be inconsequential.
How likely is it that there are attacks against the token accounting mechanism in the main DAO? Not likely. But it's not impossible unless there is an impossibility proof.
Since DAO tokens are being traded right now, the attack could already have surfaced. But if I were the attacker, I'd wait to unleash the attack when the tokens are redeemable for ether, so I can extract much more cash, instead of tanking the thin DAO token market.
There may be bytecode/EVM-level attacks that might enable one to raid the refund contract. It seems unlikely that there exist such bugs, but there could well be some. Low level bugs would affect much more than the refund contract. But because of the implicit bug bounty in the refund contract, we might see such a bug surface now, through this hard fork episode.
I have written about replay attacks. Following a fork, one can interact with a smart contract on one chain, and replay it on the other chain. For instance, I can play tic-tac-toe with you to a draw on one chain, replay your moves on the other chain, change my countermoves, and win. But these attacks extract only as much money as there exists on the minority chain, and help make it die out faster. In general, there may be vulnerable contracts out there that manage pre-fork money. Talk to an expert if you're writing or you have written a contract that handles large amounts of cash on how to make it secure against replay attacks.
The incentive structures are lined up for people to coalesce onto the new chain, with the economic majority, fairly quickly. If all players were maximizing ether value, we'd see the minority chain wither and die quickly. However, there are a large number of people who see Ethereum as a threat to other cryptocurrencies (for entirely flawed reasons) and want it to fail. Further, some people specifically want Ethereum's hard fork to fail. These folks might well subsidize mining on the minority fork for some time. If this were to happen, sit tight and do not panic because the predictions, made for rational parties, do not hold in the presence of economically irrational players subsidized exogeneously. Use the majority chain and ignore the minority chain's hashpower until the subsidizing party gives up or runs out of cash. Some people, especially in Bitcoin-land, play up the importance of mining power, mostly because they are thoroughly confused, as seen here and also here. The moment when there are no exchanges on the minority chain, it becomes a harmless testnet. Remember that miners are worthless without economic power, which comes from exchanges.

This is a comprehensive list of my concerns about the hard fork at the moment. Remember that not every concern points out a genuine problem: distributed systems researchers are, by nature, a paranoid bunch. As long as every concern is systematically evaluated, we will be able to ferret out all bugs and ensure an orderly fork. I expect most, if not all, of my concerns to be misplaced, and I would indeed be delighted if they were. I hope this glimpse into how a distributed systems researcher thinks about issues is received in the spirit it's intended. Note that I did something in this post I don't often do: I literally shared my fears, uncertainties and doubts. But uncertainty is precisely what drives good engineering, good system design and bug finding. And if the readers of this blog know one thing, it's this: distributed systems work only to the extent that they have a principled reason, a proof, for why they should do the things they should do. What they do on a sunny day, with the wind on their back and with nary an attacker in sight, is immaterial. I do not currently have a strong reason, a proof, to believe that this hard fork code would withstand an attacker as is, but I hope we can arrive at one before the fork. At a higher level, I am convinced that the hard fork is the simplest path forward, that the risks are manageable, and, above all, I have full faith in the Ethereum community's ability to engage in civil, technical discussions to address potential problems. Ultimately, it is the communities that provide value to ink on paper or numbers on a ledger.

Follow Up

I circulated the draft of this post among some friends and received some feedback, for which I'm grateful. Specifically, Phil Daian and Ittay Eyal provided invaluable feedback. Christoph Jentsch provided insightful commentary on a draft of this post, and pointed out the list of functions touched by the refund contract. Note, however, that even though the refund contract invokes only a subset of The DAO, it is implicitly reliant, through data-dependencies, on the correctness of the rest of the functions in The DAO that manipulate the data fields used in those functions. So the trusted computing base of the refund contract is substantially larger than the subset that Christoph has identified. To my delight, Vitalik responded with a well-thought-out proof sketch for why the DAO token creation cannot be abused to generate fake DAO tokens. I'll let him chime in below with his proof sketch, if he so chooses. That responds to my call for a proof above. I still prefer my enumerated address technique, but I feel much better about the impending hard fork. Coincidentally, a hard fork bug bounty was introduced today. And Christoph has pointed out that there is already a more general bug bounty in place. These are great developments

Read full here