Plans For Mirroring API Node Snapshot

in HiveDevs17 days ago

I currently run a seed node constellation and with the additional goal of helping hive
related infra, I'll be mirroring the API Node snapshots so that others can easily get up and running with them. Right now only one user hosts, them, @blocktrades and team. I did see interest from one other user for grabbing them and setting up a torrent for them, and if that does get established, I'll link my copies to the torrent as well.

The basic way that this will be run is that the content of the snapshot(a little over 2.5TB as of now) will be mirrored to my own servers, and then served from there(and if @disregardfiat gets the torrent goal done, add these to the torrent). Right now the plan is to mirror to 2 locations, Chicago and San Fransisco. I initially thought of using my own infrastructure in SF, but realized that I would need to upgrade power capacity to add another server, and didn't really have a server that's rented to me that supports hot swap(bad planning on my part) and I don't want to power anything down. I did manage to find a provider who I can work with and send them a drive and they'll rent me a VPS with the drive passed through to it for fairly cheap(I'll be doing some network shenanigans to route everything through my network and access their network via a shared IX to keep the network costs low for them). The timeline for their side is to get this up by late May, so my side should be ready by early June.

For Chicago, the plan to to buy a new server to colocate in a DC that I'm working with to rent space. Having learned from my mistakes, I'll be getting a server that supports hot swap drives this time. To keep costs low, I'll be getting some fairly old hardware, as there isn't a real big need for a very powerful server for this, just something that'll support a decent amount of disk space. Right now I've spec'd out a Supermicro server with Intel Xeon E3-1230v6 CPU(this thing is old, but will do the job just fine), 64 gigs of DDR4 ECC RAM, a 10 TB HDD, a 500 GB SSD for boot and a 10 gbit network card. Without the network card, the price comes to $1669 + tax + shipping, so I'm expecting this to end up costing about $1750 +tax +shipping in the end. There's more work to be done to get Chicago up and running, so this will probably be up around mid July from my initial timelines.

For fetching the data, I'll use lftp(https://github.com/lavv17/lftp) to mirror it to one location, then copy from there to other, but might work with the source of the data(@blocktrades) to see if we can make it more efficient. Alongside the api node snapshot, I plan to put a copy of the block_log there as the only public copy we have right now is hosted in the EU and it's terribly slow to copy to the US, and related files. I'll also get my hive-engine snapshots up once again. Once the US-West mirror is done, I'll make a post.

All of this is self funded so if you would like to support me, please throw a witness vote for @rishi556.

Sort:  

We set up torrents for the API node snapshots a while back (implemented via a web seed). In the future, we'll add a new torrent roughly once a week with the latest incremental update (and at some point just put up a fully new one I suppose).
Torrent links can be found here: https://snapshots.hive.blog/

This page is probably going to be updated shortly: it was just two torrents, one with "first snapshot" and the 2nd with a set of the incremental snapshots, but we're going to move to one torrent per snapshot.

Currently we only have two seeds for the torrents, so we could use some more volunteers. If you're going to be a seed, just seed the full one for now and wait till we get the one snapshot per torrent to seed the rest.

Note that anyone can help us seed the torrents, you don't have to be an API node operator to help out, you just need 2.5TB of storage space (e.g hard drive) and a connection with a reasonable upload speed.

Didn’t know you already had the torrent up. I’ll let you know when my copy is up!

Actually just bunch of hived nodes should be the best way to distribute block_log data, but we don't have there "just share the block data" mode.
You could use aria2c to get from multiple sources, and I would advise to use only initial one + weekly (or other biggest incremental chunk you can get), as that's the most effective way, otherwise moving around such big amounts of data when you deal with daily snapshots doesn't make sense because it would be quicker to just replay.
(Unless the server is exceptionally slow OR network is exceptionally fast, YMMV)

aria2c looks interesting. lftp is great about not redownloading files that haven't been modified so once that fat 1.8TB download is done, no need to send that through the pipes anymore, just the incremental stuff.

You’re doing a great job and I’d vote you as a witness
Keep it up!