Witness Update: Added a Backup Node & Other News

in #witness-update5 months ago (edited)

A Rough Patch

My witness node crashed about 1.5 weeks into the month and needed to be reindexed. I had chosen the server provider, Privex, because of the promise of being able to get back up and running more quickly with a server reset being that it includes a recent snapshot of the blockchain.

After resetting the server and spinning up the witness again, I was synced up to around August of this year. I had hoped the snapshot would have been more recent, but I thought everything would be fine and it would catch up quickly. Almost a week later, it was still not caught up to head. I contacted @rishi556 who does support for Privex to see if he had any suggestions, and he was very helpful as we attempted to troubleshoot the delay. During this process, it was determined that perhaps SSDs can no longer keep up with the needs of running a Hive node, so that tier of service was removed from their offerings.

Meanwhile, I went ahead and purchased another Privex server located here on the west coast with an NVMe drive configuration. The hope was that this would sync up within hours. So I got the new server configured and running, and the snapshot started from July of 2023. I wasn't worried because it seemed like it was going to catch up quickly. Unfortunately it was not the case, and 24 hours later it has barely moved a week. I ended up cancelling the new server.

I Built My Own

I decided to go ahead and put together my own machine to run my witness node on. I have been a system builder for a local 3D house for years and have computer parts coming out of my ears, so it was not a big deal. I also have several rackmount server cases and power supplies from my crypto mining days. The server has Threadripper processor, 128GB of RAM and a couple SABRENT 2TB Rocket 4 Plus NVMe 4.0 Gen4 drives. I originally put 64GB in there, but found a bigger kit in a box of parts, so went with it. I was not going to have any bottlenecks this time!

I think it can handle it...

I set up the node quickly using @someguy123's Hive Node in a Box and went from nothing to fully synced in about 26 hours. I went ahead and switched block producing to this node and a few hours later, my original node finally caught up.

Current Situation

As it stands right now, I have renewed my original Privex node for another month with the SSDs and will be using it as a backup for for the one I am running in my work server closet. I don't want to be caught in a situation again where I am down for weeks! With a power or internet outage I can be back up and running in minutes rather than days.

I have been running stable and producing blocks for well over a week now and can quickly switch to the backup in case of emergencies.

Ranking Dropped

Unfortunately, I did not gain any support this month - but didn't lose any either. However, I did drop from 89th to 91st place. This was not due to losing votes, but rather a result of a couple of witness nodes gaining more votes than me. On the upside, had I not been down for so long, I would have finally broken even on my costs!

It was a hard lesson learned, but I definitely feel in a better position to recover quickly the next time something like this happens. It is always better to have a backup node! I also learned that it isn't necessarily faster to download a block log to replay from. Even with a gigabit+ internet connection it can take days to download as the file transfer rate may be throttled.


Please consider voting for me as a Hive witness.
I would really appreciate your support!

wildebeest.jpg

Sort:  

Great to see you back up and running. More home witnesses is something I’d love to see(though not at too high a rank due to stability at home vs dcs). Best of luck, knowing your past, I know you got this

though not at too high a rank due to stability

I agree 100%, and that is why I wouldn't really do it if I were a high enough rank. At that point it would pay for itself, anyway.

I think it is better to not rank up too quickly as it gives you time to learn experience all of the pitfalls.

It's great to hear that you are back up again! Can you share the costs of the (home) machine that you created?

Even with a gigabit+ internet connection it can take days to download as the file transfer rate may be throttled.

I hate when that happens and it happens a lot!! You plan your time and you see that you need double of that to wait... or even more

Can you share the costs of the (home) machine that you created?

Let me get back to you on that. I didn't buy anything specifically for this project since I already had a lot of computer parts lying around. My friend owns a 3d sculpting studio and frequently upgrades his employees computers. So when I do the upgrades I end up with all the spare parts.

My biggest recommendation is having at least 64gb of RAM and a fast 2tb NVMe drive. Apparently running a node can be cpu intensive but I think that the thread ripper might be a bit overkill.

Oh, well... That's the best... When you don't have to buy, but rather collect parts that are lying around... 😃 I remember those times when I was much more into hardware stuff! :)

Thanks for the suggestions!

Being a witness brings some responsibilities. It looks like you are learning more as you go along. Thanks for your sevice.

I am doing my best! Thanks a lot.

Thanks foe your updates on the witness, as I mention I want to run my own on 2024 but rn just trying to build my account and I know that there will be a troubleshooting learning curve when it comes to run a node, question, you can keep both node sync and just switch credentials to assign it as the running service? Backup on a service is the first thing I think about and after the backup is setup then test the restore process to make sure it works as design, thx for all this updates again ✌️

Right, so you can sync both nodes, and create a different key for each one. That way, when you want to switch to the backup node, you just have to run the update_witness command with the public key of the backup node.

Make sure you use a different key for each node that is running the witness plugin or you will run into problems.

Cool will bookmark this post for reference ✌️

That's too bad, but awesome that you were able to put something together. I have some HP servers that I am getting ready to shutter and I wish that I could set one up as a node at some point. I'd be interested to know what kind of traffic you get. I think that is where my issue would be. I think they are both DL 380 or 360 G6 servers.

You'd see high traffic usage during the initial sync as it has to pull in the 450is gigs of the blocks but after that, it's just smooth sailing with minimal usage. The values that @nuthman shared seem right.

Okay, thanks!

I actually didn't know that myself! I just had a look at the router for that machine:

This it does go up and down slightly but not much. Once in a while it barely spikes when it updates the price feed as well.

It certainly doesn't effect my internet speed or reach my monthly bandwidth limits.

I have some HP servers that I am getting ready to shutter and I wish that I could set one up as a node at some point

I'm sure you could. If you are using SSDs, I might recommend running a couple in RAID 0 to get the extra speed. Rishi was worried that SSDs are starting to not keep up. But I am not completely convinced of that theory. But since they are relatively cheap it wouldn't hurt to stripe them anyway.

I know SSD's have limits on read/writes, but it still seems like they would be able to keep up better than a traditional drive. The speeds are just so much better. It's really only recently that they even offer them in servers for RAID Arrays, so I guess we are all still learning just how effective they can be. Those numbers don't look too bad, but I think it would probably be a conflict of interest for me to spin one up at work!

I did drop from 89th to 91st place.

Hopefully you will get the support you deserve and get back up improving rank.

I wish you best of luck.

I hope so too! I would be perfectly content sitting somewhere between 40th and 50th place at some point.

espectacular tu historia, y muy brillante lo que haces..

Glad you got it up and running! Too bad you had to buy the extra unit only to find it couldn't do the job. It's a good thing you know how to build your own, and now you have backup just in case! I'll have to give you another shoutout again sometime this next week to see if I can't help out to get you profitable!

Yeah, I did have to test the theory if the problem was coming from the SSDs, and it seems like it didn't. So at least I learned something! The main thing I learned was that I have to have a backup, which I now do. Now here's hoping that both don't go down simultaneously! Merry Christmas!

You are putting up a great work and this is cool to actually hear

Thanks! Your support is much appreciated, @biyimi.

Hi mate, I just came across your post, I have just started my own witness about a week ago, I went with my own server straight away, I have full control and don’t rely on a hosting service, best way I think. Anyway send me a vote if you have a spare slot, I’ll do the same.

Hey, congrats! Yeah my home server has been running stable for a while now and I am happy so far. It is good to have a backup node running in another location that you can quickly switch to, though in case something happens (as I learned the hard way) but totally your choice.

send me a vote if you have a spare slot

I recently used up my last vote, but I will keep an eye out. People abandon their witness nodes all the time and I have to move votes around.

It makes good sense to have a backup in case your main server has a big failure like a dead motherboard, I will get a backup at some point, but for general patching, I think disabling it for a few minutes won't be a big deal.

I have a spare slot, standby for an incoming vote.

Thanks for the vote! I will hit you up when I get a free vote.

The backup is great for catastrophic failures, of course. In my case there was an issue where the node froze or crashed in a way such that the block log was corrupted. I needed to do a full replay which took days on the Privex server. (Even with a July 2023 snapshot)

On my home machine it can sync from scratch in 27 hours, but that is still a long time to be down.

But to be fair, it isn't as huge a deal when we are lower than 50th place in rank!

Well, that was fast. Turns out I got a vote for you. There ya go!

PIZZA!
The Hive.Pizza team manually curated this post.

You can now send $PIZZA tips in Discord via tip.cc!