The Problem with Byzantine Generals

in #blockchain7 years ago

There are many different blockchain consensus algorithms being promoted these days. Academics like to evaluate these algorithms to see if they solve the byzantine generals problem. This particular problem is how to get a group of lieutenants to agree on whether to retreat or advance when up to ⅓ of them may be dishonest. In terms of blockchains, the challenge is to get the entire world to agree on a particular blockchain.

One of the aspects of byzantine fault tolerance (BFT) is that an algorithm always reaches a conclusion and never ends up in an indeterminate state. In other words, at some point people need to know with certainty that everyone is in agreement and that a minority of bad actors cannot halt the process completely. In all systems an assumption that ⅔ of the generals are honest is a mathematical requirement.

Proof of Work does not Solve Byzantine Generals

Bitcoin and other Proof of Work algorithms define the best chain as the one with the most difficult proof of work. A problem with proof of work, from byzantine perspective, is that it is always possible for the blockchain to be reversed, it just becomes increasingly unlikely and uneconomical. In order to consider Bitcoin and similar blockchains to have BFT you must assume that 99.999% certainty is the same as 100%. Someone with 10% of the hash power can pull off a double spend against someone waiting for 6 confirmations once per month or 0.02% of the time.

Cosmos White Paper

The cosmos white paper claims a working byzantine fault tolerance algorithm whereby a group of block producers can unanimously agree on the next block. They claim their process guarantees the network to advance as long as more than ⅔ are honest. The rest of the world never has to consider blockchain forks because they are “impossible” under their algorithm.

I was thoroughly impressed at the thoroughness of their approach and the robustness of their algorithm. I was also very appreciative that some of their team took the time to evaluate my own algorithm, Delegated Proof of Stake (DPOS).

Delegated Proof of Stake (DPOS)

DPOS is an algorithm whereby stakeholders elect an odd number of block producers. Each round the block producers are shuffled and assigned a time slot in which they should produce a block. With this process in place, the longest chain is considered the best chain. It is impossible for a minority of block producers to create a longer chain because they would be producing fewer blocks per minute than the majority blockchain.

Like Bitcoin, under this model alone you never have 100% certainty because the majority could suddenly switch to a minority fork and stop producing on their own. Eventually the minority fork would become longer and global consensus would necessarily change.

We then introduced a new concept known as the Last Irreversible Block (LIB). This is a block which has been confirmed by ⅔ or more of the elected block producers. No node will automatically switch to a fork that isn’t built on top of the LIB.

Members of Cosmos community recently discovered a particular sequence of events that could cause the LIB algorithm to break down by causing the network to split on two different LIB. This vulnerability in the LIB algorithm does not constitute a breakdown of DPOS because the LIB algorithm was an optional addition designed to identify the number of confirmations required before an exchange has sufficient guarantees that a block will not get orphaned. The LIB algorithm is similar to the 6 confirmation algorithm used by Bitcoin. It works 99.9999% of the time. In over 2 years of operation and 10’s of millions of blocks it has not yet experienced the kind of deadlock condition that is theoretically possible.

The Real Problem - Who are the Generals?

A problem with Cosmos is an in implicit assumption that the set of generals is fixed. At least for a given block. If at any time over ⅓ of the generals goes rogue, then the network halts and there is no process for resuming the network.

Under DPOS, if a large minority of generals goes bad then the network still operates in a “pending” state. Blocks are produced, votes are cast, and new generals can be elected. Eventually the stakeholders can reach a point where 100% of the generals are honest and the network resumes normal operation. In theory, if 95% of the generals fail the remaining 5% can still hold an election to replace them.

From this we can conclude that Cosmos has a different deadlock condition. DPOS is robust due to its ability to continue operating at a reduced capacity / confidence level even when the majority of the generals are down.

Trade Offs

Every algorithm makes certain tradeoffs. Cosmos has 100% guarantee of no forks and can produce blocks every 5 to 10 seconds on a distributed network. To achieve this with the same number of generals as Steem (21), they consume network bandwidth similar to a sustained 5 transactions per second.

On the other hand, Steem rarely has any missed blocks and even more rarely has an orphaned block. Statistically speaking, a block produced while there is 100% participation of generals (producers) has a 99.9999% chance of being the final block. After two blocks (6 seconds) the probability being orphaned falls further. Anyone relying on the last irreversible block algorithm (40 seconds) is protected against almost every conceivable scenario with a worst-case (purely theoretical) being stuck in a pending state on a minority fork until manual intervention. This means that it fails “safe”, in a state where you do not accept a bogus transactions.

There is one additional level of confirmation that could be implemented in DPOS: require 100% participation and 21 blocks. With 63 seconds of confirmation and 100% participation then you can be certain that you will never get “stuck” and that every transaction you receive is truly irreversible and globally accepted.

What you should notice with these numbers is that extra time buys insurance for the last 0.00001% of byzantine cases. This is a level of confidence that takes far more than 6 confirmations in Bitcoin to reach, especially with an attacker that has 33% of the hash power.

Conclusion

Due to the law of diminishing returns (economic marginal utility), the value of increased confidence from 99.9999% to 100% is unmeasurable for the vast majority of real world transactions. People lose more from transaction fees than they would if 1 out of every 1 million transactions were to be randomly reversed. Meanwhile, the cost of gaining that last 0.0001% of confidence is significant network overhead, slower per-block times, and complete failure if more than ⅓ of the nodes go down.

Both algorithms have their place, but I am very happy with the design tradeoffs made for DPOS compared to the other algorithms available.

Sort:  

We need a definition of participation rate.
If "100% participation and 21 blocks" means 21 blocks produced in a row with no block missed, when the time is in the middle of a round, due to witness shuffling, the 21 blocks can be produced by 11 witnesses only (one witness produced one block and the other 10 produced two blocks).

Good point, must be two rounds

so 126 seconds?

People lose more from transaction fees than they would if 1 out of every 1 million transactions were to be randomly reversed.

+5%

Nice read @dantheman - thanks

A very informative and powerful sharing for us on these platforms these days. Thank you so very much for the information/education I was not aware of as of yet.

All for one and one for all! Namaste :)

Will you do more videos explaining these issues in the future? I've noticed I often find it a lot easier to digest your ideas when expressed in the form of video interviews and the likes.

it is always nice to read your post @dantheman even if i don't understand everything, but i keep learning the blockchain , thank you for sharing !

@dantheman I'll take this opportunity to air a concern I'm having, apart from the need to separate the flag from downvoting content which I have detailed in comments many times before:

It seems a lot of people have been complaining in steemit.chat lately, saying that they can't sign up because their phone number is rejected. Here's one example from just recently in #general.

willibilli 2:21 PM
Hello, I am having a problem with signing up to Steemit. Everytime I enter my phone number I get this message: "Unable to verify your phone number. Please try a different phone number." Does anyone know how to remedy this? Thank you"

Thanks. I can tell you that we are working to separate moderation (flagging) from curating (voting). This issue will be improved in time.

With respect to phone numbers, we had to block certain classes of phone number due to abuse. If you can share the first 4 digits of the number I will have our team investigate if it is on the block list.

You may want a "Contact The Help Desk" button, next to the "Continue" button on the phone signup page, to retain legit new user signups. I've noticed a few real complaints about this.

I am now curious about the phone number issue. A few weeks ago I tried to open an account for my daughter for wallet use only, with a recently added a phone to our AT&T account. It wouldn't allow it in the sign up process. It starts with 503-875. Anyway, I was able to use a different number, I figured it was just some glitch.

Asking users to disclose personal mobile phone numbers while also at the same time denying access to those who don't have mobile phone subscriptions is in itself both abusive to end-user security and elitist. Not to mention the inequality that manifests when some states force mobile phone registration on their users and others do not.

This extreme hack (comparable to using a machine gun to kill a chicken) is motivated by a desire to easily control abusers, which implies that the whole system is inherently flawed-- flawed by the fact that useful content is cannot be separated from drivel by the rating system as it was designed.

It's also somewhat shocking that steemit does not recognize the important role anonymity plays with speech freedom that's critical to having a nanny-free community. This new direction makes steemit.com unsuitable for whistle-blowers and civil rights activists. Perhaps it was never intended as such, but it's a pity to see a good tool get downgraded to Facebook-quality blogging.

Selecting users who are not street-wise (and thus willing to connect personal info to their account) has the side-effect reducing the quality of posts to that of the intelligence of that crowd.

You can buy and account with anonymous tokens or you can mine an account.

If you want a free handout of $5 worth of Steem Power then expect some protections. Also, for security purposes (hacked account recovery) a means of communicating is necessary. For 99% of people this system works.

It's good to have the advanced user options you mention, but they sound cumbersome and impractical. Hundreds of thousands of participants are being forced into a position of having th buy their freedom in order to make the spam-fight a little more convenient for a few admins, no?

Some questions:

  • How does one get anonymous tokens? For existing users who need profile division, obviously anonymous tokens cannot draw from their existing account (it would defeat the purpose). It would be very cumbersome and also costly to deposit cash into a bitcoin ATM to seed this account, many of which require id anyway (a non-starter).
  • Can the proof-of-work from the mining be transmitted entirely over Tor?
  • The 5 dollar barrier-to-entry you mention is a bit excessive for contributors who are not participating for the money-centric aspects. That's a lot of money in some countries. 25 cents would be sufficient to make spam uninteresting. Why not lower the barrier-to-entry so users don't need to rent a server farm to get a libre account? (libre meaning freedom from privacy compromise)

The current model effectively encourages users to trade their privacy for time or money. That's also a problem because it victimizes the naïve -- those unaware of the compromise they're making, and those who won't discover they need privacy until it's too late and the information disclosure is used against them.

Seems like it might be worth just charging the $5 for the account creation. If blocking the spammers makes it that much harder for new users to get in it's hardly worth it.

$5 is extortionate. That's a weeks wage in some parts of the world.

A moderate user (say 1 or 2 blogs/week) needs ~5-10 accounts just to get a basic level of anonymous identity division (thus upwards of $25). Not to mention the time and effort of getting the right form of anonymous money. This overkill is being imposed on hundreds of thousands of users for the convenience of a few admins.

It's not worth it. The high price in time and money has made the system inadequate for anonymous blogging. Users will pay the price in freedom instead.

Good, you understood what I meant about the downvote..I wasn't all that clear. But the fact that you're working on this is great to hear!

I'll try to get back to you with the numbers. Both users are offline right now.

Just too bad for you if the 1 in 1 million transactions is YOURS. ;)

Don't forget that 1 million to 1 chances crop up 9 out of 10 times.

Fortunately, the real odds are not 1 in a million but much lower and can be reduced to 0% chance of failure with a 63 second wait.

Thank you dan for explaining and critically analysing your design.

Thought I was in for a history lesson, still learnt something though.

Thanks for the clarifications.

How a DPOS algorithm will work within a network of sidechains? I recently tried a use case for sidechains with specific data types (one for post/comment types, other for task types, other for geolocation types, etc) coordinated by a main chain with account and financial management. I don't know if it's on the same page with the "fabric" technology described in the roadmap, but I'm curious. If you have time, the post is here.

Thank you.

Side chains are part of fabric. More to come on that.

I'm really excited to read more about the fabric part! Can't wait!

Thanks. Looking forward to more details on this.

6 nines is really impressive uptime, especially given the block times. Got to love distributed consensus. For comparison, the SLA for Duo's SaaS offering guarantees 3 nines, but typically we keep it above 4 nines.

It seems like DPOS is very effective indeed.

There are several key aspects that I am sure to not understand fully but this type of posting solidifies my faith in Steemit, as well as helps me learn a thing or two about the creators.

You have several good point, arguments, and further understanding. You also brought to like huge details and importance of reading the red flags.

The below quote is a major reason why I don't trade in #Bitcoins.

I mean, I bought some when they were cheap, but that is about it. I saw it as a possible failure but also a decent risk at a long term investment. I see Steemit.com and STEEM in the same way.

Bitcoin and similar blockchains to have BFT you must assume that 99.999% certainty is the same as 100%. Someone with 10% of the hash power can pull off a double spend against someone waiting for 6 confirmations once per month or 0.02% of the time.

I would have to say you do have dedication,
~ @Timbo

Basically, what you're saying is, "The blockchain has stopped, but gossip still works, so we can gossip the conditions for starting a new chain" ? How does this have anything to do with the BCA?

The Generals are the people who make the Ideas happen.....

Great post @ dantheman.thank you.best regards

I'm still learning the economics of DPOS but overall, I am happy with the tradeoffs made as well. One of the best part is we have so much space to improve on top of what you have implemented.

Thank you for taking the time to explain in detailed posts.

It is entirely opt-in, service providers don't have to process your transactions or display your content. The provider who displays everyone else's posts will have the most readers and ultimately the most people using their API and therefore consenting to fees.