Hive Pressure 2: How to Answer Hive Questions?

in Blockchain Wizardry4 years ago

First, you need to learn how to ask good questions, and here are some of the resources that will help you to do so:
https://developers.hive.io/apidefinitions/
https://hive.hivesigner.com/
(Kudos to @inertia and @good-karma)

This set of API calls is far from perfect, but for now it has to be enough for general purpose API nodes.

The big, the slow, and the ugly.

Back in the days, we used to run a so-called “full node”, that is a single steemd (yeah, we haven’t renamed the binary yet) node that was built with LOW_MEMORY_NODE=OFF and CLEAR_VOTES=OFF and configured with all the plugins you can get.
It required a lot of RAM, it replayed for ages, and it was a huge pain to keep it running.
Our code is great for running blockchain. It’s not equally efficient when it has to answer complex questions.

Current architecture

The idea is to move the workload requiring complex queries out of our blockchain nodes.

                  +----------+                                                                
                  |          <-----------------+             @     @@@@@@    ,@@@@@%          
                  | Hivemind |                 |            @@@@    (@@@@@*    @@@@@@         
        +-------+ |          <-------+         |          %@@@@@@     @@@@@@    %@@@@@,       
        |       | +-----^----+       |         |         @@@@@@@@@@    @@@@@@     @@@@@@      
        | redis |       |            |         |       ,@@@@@@@@@@@@     @@@@@@    @@@@@@     
        |       <--+    |       +----v-----+   |      @@@@@@@@@@@@@@@&    @@@@@@     @@@@@@   
        +-------+  |  +-v-+     |          |   |     @@@@@@@@@@@@@@@@@@    .@@@@@%    @@@@@@  
                   |  |   <-----> AH node  |   |   @@@@@@@@@@@@@@@@@@@@@(              .@@@@@%
        +-------+  +--> j |     |          |   |    @@@@@@@@@@@@@@@@@@@@               @@@@@@ 
<------->       |     | u |     +----------+   |     *@@@@@@@@@@@@@@@@     @@@@@@    @@@@@@.  
<-------> nginx <-----> s |                    |       @@@@@@@@@@@@@@    &@@@@@.    @@@@@@    
<------->       |     | s |     +----------+   |        #@@@@@@@@@@     @@@@@@    #@@@@@/     
        +-------+     | i |     |          |   |          @@@@@@@@    /@@@@@/    @@@@@@       
                      |   <-----> FAT node <---+           @@@@@(    @@@@@@    .@@@@@&        
                      +---+     |          |                 @@     @@@@@&    @@@@@@          
                                +----------+                                                  

Sorry, lack of GIMP skills

Hivemind

For this purpose I use Hivemind (hats off to @roadscape) backed by PostgreSQL.

Hive is a "consensus interpretation" layer for the Hive blockchain, maintaining the state of social features such as post feeds, follows, and communities. Written in Python, it synchronizes an SQL database with chain state, providing developers with a more flexible/extensible alternative to the raw hived API.

FAT node

Also, instead of a single hived node with all the plugins, I chose to run two nodes, one of them is a “fat node” (LOW_MEMORY_NODE=OFF and CLEAR_VOTES=OFF) on a MIRA-enabled instance to feed the Hivemind.

Please note that I have NOT included market_history in my configuration, simply because it doesn’t require a “fat node”, but Hivemind requires it, so make sure that you have it somewhere.

AH node

Account history node is the other node I use in my setup. It serves not only account history, but it’s definitely the heaviest plugin here, hence the name.
I’m not using MIRA here, because I prefer the pre-MIRA implementation of the account history plugin and MIRA had some issues with it. Also, it’s way too slow for replay.

Jussi

Instead of one service, I now have three specialized ones, I need to route incoming calls to them.
So the get_account_history goes to the AH node, while the get_followers goes to Hivemind.
That’s what jussi does, but it also caches things.

Redis

Jussi uses Redis as in-memory data cache. This can very effectively take load off the nodes. Even though most of the entries quickly expire, it’s enough to effectively answer common questions such as “what’s in the head block?”

headblock.jpg

8 dApps asking for the latest block will result in 1 call to the node and 7 cache hits from Redis.

Nginx

That’s the world facing component - here you can have your SSL termination, rate limiting, load balancing, and all other fancy stuff related to serving your clients.

Resources

Now when you know all the components, let's take a look at what is required to run them all and (in the darkness) bind them.

Small stuff

There are no specific needs here. The more traffic you expect, the more resources you will need, but they can run on any reasonable server on instance:

  • Nginx needs what nginx usually needs - a bunch of cores and some RAM.
  • Jussi is no different than nginx when it comes to resources.
  • Redis needs what redis usually needs - a few GB of RAM to hold the data.

Big stuff

  • the AH node, which is non-MIRA in my setup, requires plenty of RAM for the shared_memory.bin file to either hold it on tmpfs or buffer/cache it, especially during replay. A machine with 32GB RAM will work, but I would rather suggest using a 64GB RAM machine these days. Of course, low latency storage such as SSD or NVMe is a must. You need 600GB of it.

  • the FAT node in my setup is running MIRA, so it’s not that memory hungry, but the more RAM you have, the more effective it can be. A machine with 16GB RAM might work, but I would go with 32GB or 64GB for it. Don’t even try without a very fast SSD or NVMe. You need 400GB of it.

  • Hivemind itself is a simple script, but it needs PostgreSQL as a database backend, and for that you need all the things that PostgreSQL usually needs. It can run on pretty much everything, as long as you have enough space to fit the data, currently 300GB. Of course, faster storage and more RAM will result in much better performance.

From zero to hero

Reference hardware configuration:

Intel(R) Xeon(R) E-2274G CPU @ 4.00GHz
64GB RAM, ECC, 2666MT/s
2x NVMe 960GB (SAMSUNG PM983)

When you are starting from scratch, it’s best to get a recent block_log
I’m providing one at https://gtg.openhive.network/get/blockchain/
How fast you can get it depends on your network and load on my server. The average downloading speed is around 30MB/s, so you should be able to get it in less than 3 hours.

replay-speed.png

replay-times.png

Node typeReplay Time
AH Node15 hours 42 minutes
Fat Node48 hours 53 minutes
Hivemind85 hours 50 minutes

Roughly you need 4 days and 9 hours to have it synced to the latest head block.

4days9hours.jpg

A lot can be improved at the backend, and we are working on that. To be developer-friendly, we also need to improve the set of available API calls, finally get rid of the deprecated ones such as get_state, and move away from the all in one wrapper condenser_api. But that's a different story for a different occasion.

Hive_Queen

Sort:  

First, you need to learn how to ask good questions

Is this a good question?

Not great, not terrible ;-)

That was an acceptable answer.

Hello Wizard.
How about private node for basic needs?
I need wallet functionality, create accounts, manage accounts, do transfers, check transfer history.
Is AH Node enough or do I need all connected components: Jussi, Hivemind?

Most likely you don't even need complete AH node for that, you can track only those accounts that matters to you (for transfer history). You don't need jussi or hivemind. Node that tracks only few accounts needs 260GB for block_log (as every other hived node) and 60GB for state file.

Please note that you have to know list of accounts for account_history tracking before you begin replay. There's also a way to track all accounts but explicitly whitelist / blacklist given history operations.

For optimal tuning I'd have to know all your use cases.

I am curious about how it all works.

What are the complex questions which only go into the Hivemeind layer? Follows, post feeds, communities, and what else? Doesn't this partly compromise censorship resistance?

What is the difference between the consensus rules for blockchain nodes and those for Hivemind nodes?

You can think of Hivemind as a database that collects the data from consensus nodes and organize them for a faster access. See my Steem Pressure #8: Power Up Your dApp. RethinkDB? KISS my ASAP. Ideally there would be a consensus node, and all other data effectively processed, stored and served by bunch of specialized microservices.
Hivemind fetches the data from Hive nodes and interprets them for API purposes.
That itself doesn't affect censorship resistance. Of course API node can be malicious and/or censoring(*) the content as it happens on Steem with Steemit Inc node, but to get the content you are looking for all it takes is to find other non-malicious node or get your own node running. To check consistency with the consensus... the consensus node is enough.

at the moment I am a little confused with the steem which now steemit is a little different from before maybe this is not a good question but I am. Very worried about steem and steem hive.blog what steem and steem hive.bog are different
after I read some of your posts maybe you, a little know about this I do not know who to ask.

Trying to keep up with these updates and will be sharing this link when some members we have here on hive asked.

You sir are pure genius, I would like to say thank you for creating or co creating hive, I love it, I lost all faith in steemit months ago, gave my account with this name on it to my daughter, she gave up on it day 1.

So again, thank you for making a refreshing new site, with new visions.

But the wizard has state files to boot the chain from, does he not?

I use state providers to quickly restore in case of a bigger failure, but it wasn't used here.
Here block_log was used, i.e. not state, but content of blocks.
To get the state, you need to replay blocks.
You can get blocks from other machine or public source (like in this post), or alternatively you can sync blocks from other peers in Hive network (but that's currently way slower than downloading blocks and replaying them).

Hello
I want to get the content of the blocks and analyze them on the chart. Can I download the block _log and use it for that?

Yes, but basing on your previous question you'll probably fail with understanding it's structure. Running local node and using get_block might be much easier to achieve that. Please read the docs.

Yes thats rigth, but here I have a number of restrictions on how to run a node and get a block, so I said it would be a great help if I could download block_log and analyze block information through it.

You don't even need to run a node, just get_block from remote public API (please remember about distinction between Steem / Hive, those are two separate blockchains). And be prepared for a lot of data, uncompressed JSON will take hundreds of gigabytes.

I've used get _block before, it took a long time for each block to take a second, so if block _log gives me the same get _block information, it will help me a lot. And I'm trying to do more research on the difference between Hive and Steem. Thank you.

Excuse me, if I want to get the information via get _block, what features (what kind of server or RAM and cpu) do I need? And what can be done to speed up the response to requests?

I didn't understand 100% because of my lack of English proficiency. However, I read a good article well.

Really great post @gtg ! Keep it up!

Hi. I see you voted up Mr King's Crown. He seems to be trolling Hive whilst milking Steem and he retaliates against anyone trying to reduce his rewards here. He's not exactly an asset to the community. But of course you should vote as you see fit.

Hope you are well.

here's how to ask a good question: Why is the blockchain working like shit?

nice

Gracias por compartir

this is so helpful. we hope to see many helpful tips like these always

You said you have not included market_history in your configuration but it is required by Hivemind. But you are running Hivemind too, so how did you do?

Congratulations @gtg! You have completed the following achievement on the Hive blockchain and have been rewarded with new badge(s) :

Your post got the highest payout of the day

You can view your badges on your board and compare to others on the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Do not miss the last post from @hivebuzz:

Revolution! Revolution!
Vote for us as a witness to get one more badge and upvotes from us with more power!

Well quite informative

Congratulations @gtg! You received a personal badge!

Welcome Comrade!

Thank you for joining our fellow revolutionaries to promote decentralization and censorship-free speech.
Hive is your new home and you bravely chose to definitely turn your back on Steem.

Invite your friends to join our fight by sharing this post.

¡Viva la revolucion!

You can view your badges on your board and compare to others on the Ranking

Do not miss the last post from @hivebuzz:

Hive Revolution - Mission 1 - Communication
Hive Revolution - Call for missions
Vote for us as a witness to get one more badge and upvotes from us with more power!

Hi @gtg, why did you downvote my post. did I make mistake. if I make a mistake. I am sorry. please delete downvote immediately.

Hello. It's nothing personal, it was just a disagreement on rewards. Downvotes are integral part of this system and are essential in a process of finding a consensus for evaluation of posts. For example there are users that are trying to bet on curation rewards, voting regardless of content and others who are countering such votes, and such thing happened in this case.

may I ask you to fix it in my post, upvote it again,

may I ask your discord to communicate

The new Star Wars spin-off series on Disney+, The Mandalorian, has been making huge waves on social media thanks to one tiny star: Baby Yoda.
This message was written by guest bxy_3344235, and is available at crypto.investarena.com

 4 years ago  Reveal Comment

Gaby and I are good and settled in a nice place on the beach in Thailand.

She was supposed to start writing! No excuses.

When it comes to trading I have no idea, definitely not my field of expertise :-)
Nonetheless I'll take a look :-)

We had some bad experience with self acclaimed TA expert who used those charts as a low effort way to self vote himself. However, from what I saw on chats, there are tons of people into that so you might find a good audience. Not a whales though. They usually already know when to buy or sell :-)

That's called optimization. Without Hivemind and MIRA, scalability would be extremely difficult and pricey. And remember, STEEM is running on a similar infrastructure, so it's bloated too, by your definition.