Steem recovered from the hf20 fork thanks to abit and its emergency patches

in #witness6 years ago (edited)

After the hf20 fork was activated, blocks that produced by hf20 nodes were not accepted anymore by hf19 nodes. This lead to a fork and as the exchanges are on hf19, it was decided to move to the hf19 chain.

Thank to @abit and its four patches:

@gtg was able to start the steem blockchain again together with @abit. Then @roelandp could also jump in and the steem block production was slowly starting again.

I voted for @abit as witness now, to express my gratitude that he saved steem :).


After downgrading to 0.19.12 and applying the patches, my witness server is running again. Now, I'm waiting for official patches.


my config.ini (remaining parts as usual):

p2p-seed-node = seed-east.steemit.com:2001 seed-central.steemit.com:2001 seed-west.steemit.com:2001 steem-seed1.abit-more.com:2001 52.74.152.79:2001 seed.steemd.com:34191 anyx.co:2001 seed.xeldal.com:12150 seed.steemnodes.com:2001 seed.liondani.com:2016 gtg.steem.house:2001 seed.jesta.us:2001 steemd.pharesim.me:2001 5.9.18.213:2001 lafonasteem.com:2001 seed.rossco99.com:2001 steem-seed.altcap.io:40696 seed.roelandp.nl:2001 steem.global:2001 seed.esteem.ws:2001 seed.timcliff.com:2001 104.199.118.92:2001 seed.steemviz.com:2001 steem-seed.lukestokes.info:2001 seed.steemian.info:2001 seed.followbtcnews.com:2001 node.mahdiyari.info:2001 seed.curiesteem.com:2001 seed.riversteem.com:2001 148.251.237.104:2001 seed1.blockbrothers.io:2001 steemseed-fin.privex.io:2001 seed.jamzed.pl:2001 seed1.cryptobot.news:2001 seed.thecryptodrive.com:2001 seed.brandonfrye.us:2001 seed.firepower.ltd:2001

checkpoint = [26037575, "018d4d47225e6cada82b9aaabc8503ee318c547c"]

Sort:  

This isn't the first time @abit stood up for the good of steem, when the n2 was being abused by the whales he stood up and flagged them for it.
He showed that minnow votes count when whales stop sucking up the entire pool.
For that, he lost his top 20 spot.
Clearly that was in error.
Lucky for us, he doesn't hold a grudge.

@gtg is 1 - ok.
@roelandp is 3 - ok.
@abit is 28 - not ok.

Voted for abit as a witness and spread the word!

EDIT:
abit is 20 :)

@abit is going to get a ton of support after yesterday. First thing I did when I could get on to Steemit was to vote for @abit.

I guess it has something to do with his activity here. If @holger80 didn't tell us that it's all @abit's job I wouldn't really know, would you? He is much more active on the BitShares blockchain that is why he is a top block producer here. :)

Sincerely,
@Mysteor

@abit ... you earned yourself a witness vote. Thank @tobixen

Seriously?
Why were 0.20 .0 software versions running?
why didn't anybody review the code?
Why is everything hidden in secret chats?
I have no idea, what version to checkout.

This is utter bullshit.
0 communication to us low ranking witnesses.
no proper documentation and instructions given.

This case was another example, that steem is not decentralised.

@isnochsys I understand you are frustrated. Imagine how I feel. HF20 been in the works for so long. It was really a pity this bug was missed, most probably a simple testing error. Seconds vs hours - was used in payout window, (my guess to test this a more speedy way). Then the bug occurs, causing 19's to halt, obviously reverting to 19 as exchanges use this. Then the second bug on 19's actually prevented to rollback to that - also because (some) HF20 nodes where still producing with the bug enabled making the network dirty.

About your q's/remakrs:

  • ofcourse people reviewed the code. Even ran testnets and simulate forks. This was really an unfortunate miss. We are all humans. See below.

  • 0.20 was running as a witness runs it to signal / vote for the hardfork to be happening on the scheduled HF time. The way the software is build is that new features should only be activated after the "scheduled HF Time" which in this case was next week tuesday. Obviously as a witness it is task to keep older software running as backup for a while until certain the new software is running ok. Many did this, but when switching back to 0.19 the bug on that software occured. Network stuck.

  • Not "everything" is hidden in secret chats. It is the same when disaster strikes: it's best to not start shouting around and create useless panic. What happens when the chain halts: one (and only one) node with a patch should restart creating blocks irregardless of whether other nodes are running and restart broadcasting blocks with the patch enabled. Normally the software waits for "a recent block" before producing, so there must be only "one producing" node set to ignore this (which can be done in config.ini). Imagine if there are 100's of witnesses producing these 'stale blocks', would not help a restart. As the chain state, when crashed, still keeps the current vote slate, it is imminent for a "quick" relaunch that witnesses who where prior to the crash had a top20 position are on duty and ready to be applying patches and start signing again as soon as possible. It is less important that nodes who only sign a block once a while also immediately know this until the code is verified and working correctly. As you say you need a replay 36 hours, imagine then during replay it was already found out that the patch you applied did not fix it, another replay. Hence stuff is kept in a smaller group when this emergency arises.

  • steem is decentralised. But all nodes crashed. the token distribution might be skewed but that is something different then having a decentralised network. The chain is 100% decentralised. This fix even, was done not by Steemit inc, (the main developers team of the Steem chain) but by @abit as Holger also mentioned. Obviously @abit deserves your witness vote, he has been working in Graphene for who knows how long and comes from the BTS era where he is one of the core team (DAO) developers. Personally I've been voting for @abit since I learned about him when I started on this platform.

Lastly: the patch was needed to get beyond the "rogue" HF20 chain which was still producing blocks and moving the Last irreversible block forward. As we agreed that that chain should be halted that chain needed to be ignored by a patch. As the chain restored and HF19 overtook HF20 all is good now and you can / should even be able to restart your HF19 node without any patches. Just give it time when you come at the "drama block sets". It won't harm to put in a checkpoint in your config.ini

checkpoint = [26037575, "018d4d47225e6cada82b9aaabc8503ee318c547c"]

I find these moments kinda magical in a way. You have the chain. It stopped. It paused kinda. It stopped because something was going wrong. It saves itself by 'crashing'. A fix is applied (yes it took a while because of HF 20 nodes still running) and then voila - 'move along people, nothing happened'.

It was fixed without the help of steemit inc, so it's kind of decentralised.

The main problem is price manipulation, because of this, it is not a bad idea to hide this chat.

Checkout 0.19.12 and apply the patches from abit, then replay. I pasted my changes to the config.ini.

replay will take me 36 hours or so. after that if it doesn't work, maybe there is a better routine already in place...
a real hotfix, maybe even a patch

There is a branch, which apply the 2048 fix: https://github.com/steemit/steem/commits/20180917-increase-fork-buffer

but no official release yet.

Replaying 3 nodes took me 3-4 hours each.

What is your
cat /proc/cpuinfo
?
I would guess you have an i7 processor?
Those seem to run the replay way faster.

model : 79 model name : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz stepping : 1 microcode : 0x1 cpu MHz : 2199.998 cache size : 16384 KB

so it's kind of decentralised

There is no decentralisation if Steem is completely down because of few witnesses wanted to run new software.

The witness are elected by the steem user and have the power to vote for a new hardfork. So when HF20 was announced, every witness has to decide to install it or not. When a defined percentage of witnesses did go to HF 20, it was activated. Normally, this should not be a problem and the new features are then activated on the scheduled date.

This time, there were two undiscovered bugs, one in HF 19 and one in HF 20.

After the crash, the elected witnesses have the duty and the power to recover and to start the steem chain again.

I do not see, why this is not decentralized?

why this is not decentralized

Because whole community have to decide who are those 20 centered witnesses. Those twenty are centralised power and decide what to run for whole network.
We should rearrange votes on them if they fail like yesterday.
Vote for @abit!

That's b.s. Steem was frozen since it's how the blockchain works if there is a critical error.

Freezing the block production is the safest thing to do when nodes disagree on block validity, but it should be possible to freeze the block production without halting all the nodes and hence stopping all the user interfaces from working. Ideally it should still be possible to read old posts even if the blockchain is frozen.

It's also a setback that it took so long time to resolve the issue.

I really dont understand

Thanks for the info. upvoted!

Yea, I learned several bits if information just by reading the comments.

Posted using Partiko Android

I am just glad there are some smart people around to fix this sort of thing. It's not just as easy as re booting something obviously.

Newb question, are the source code diffs for upcoming HF’s publicly available in advance? On GitHub?

Posted using Partiko iOS

howdy sir holger80...thanks so much for explaining this to us, this I hadn't heard. I hope his ranking is restored.

Hi @holger80!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your UA account score is currently 5.666 which ranks you at #438 across all Steem accounts.
Your rank has dropped 1 places in the last three days (old rank 437).

In our last Algorithmic Curation Round, consisting of 441 contributions, your post is ranked at #310.

Evaluation of your UA score:
  • You've built up a nice network.
  • The readers appreciate your great work!
  • Try to work on user engagement: the more people that interact with you via the comments, the higher your UA score!

Feel free to join our @steem-ua Discord server

So this is a good score?

Posted using Partiko Android

I'm going to vote for you and these 2 guys as witness.

I had an idea that I shared in the UA chat but you seem not to be able to check everything in there which is totally understandable (too many hours in the day)

I think you guys should work on some algorithm for UA that gives better rep for interaction with a variety of different kind of users, minnows and whales, high rep, low rep, new and old (best if they maintain their relationships as well). This will incentivize interaction with ALL kinds of users rather than encouraging everyone to go straight to the top of the rep list and compete for whale attention. It would be important to distinguish between maintaining relationships with old friends (positive, should be rewarded with rep) and circle jerking or interacting with old friends exclusively (not good, should not be rewarded with rep)

There's another idea which may be harder to implement but would also be awesome: Rewarding better rep to people who curate community. You'd have to figure out a way to identify who introduces steemit users to other steemit users. Like if I have a list of friends and I interact with a new user and thanks to me they discover a bunch of other users, maybe you can see that 60% of their interactions AFTER multiple interactions with me are with people who I interact with. A good example of what I mean would be how I like to look at @tarazk's comment section to find steemit users that I like and who I think I may have common ideals with. If I end up with a few new friends who I interact with a few times, I think he should be rewarded with better rep.

This is incredibly valuable information to learn, and I had not found this info anywhere. I already voted for @abit and now I have added @holger80 to my witness votes......I like this kind of information, thanks.

Good to know this! @holger80
I vote for @abit now for his great work to run the chain!