23rd update of 2021 on BlockTrades work on Hive software

in HiveDevs3 years ago (edited)

blocktrades update.png
Below is a list of Hive-related programming issues worked on by BlockTrades team during last week:

Hived work (blockchain node software)

We moved hived tavern tests and improved the get_account_history test to allow testing results against a golden reference server. This will enable us to verify a HAF-based account history server against a known-good hived-based account history server.
https://gitlab.syncad.com/hive/hive/-/merge_requests/261

TestTools work (python-based test system for hived)

https://gitlab.syncad.com/hive/hive/-/merge_requests/259

Visible changes for test developers

  • Added support for test parallelization (in processes with pytest-xdist)

    • Run following jobs in parallel:
      • cli_wallet_extended_tests
      • test_tools_tests
    • Rewrote loggers system
    • Rewrote managing of directories for data generated in tests
  • Improved interaction with snapshots

    • Added support for loading snapshots from a custom path
    • Added snapshots comparison
    • Removed generated snapshot files (after automated tests only)
  • Added remote node (Use remote node for main_net implementation)
  • Extended documentation
    • Added tutorials about replays and snapshots
    • Document cleanup policies

Internal changes

  • Checked TestTools sources and tests with pylint
    • Fixed or temporary suppressed all reported problems
    • Added CI job running linter
  • Introduced scopes which allows easier resource management. Use them for loggers and directories for data generated in tests.

Hivemind (2nd layer applications + social media middleware)

We’re planning to move to Ubuntu 20 (from Ubuntu 18) as the recommended Hive development platform soon, and this also entails a move to Postgres 12 (from Postgres 10) because that’s the default version of Postgres that ships with Ubuntu 20. So we’re working thru performance regressions associated with differences in the way Postgres 12 query planner works.

As reported last week, we changed the query for update_post_rshares to fix a performance killer and changed when we executed some vacuum analyze calls. One of the fixes involved temporarily disabling just-in-time compiling for this query(https://gitlab.syncad.com/hive/hivemind/-/merge_requests/513).

We now have benchmarks for the overall performance improvement for update_post_rshares after massive sync (where this performance regression was particularly noticeable although it is also an important enhancement for live sync performance as well):

  • Old performance: 6.9 hours
  • New performance: 38 minutes

Hive Application Framework: framework for building robust and scalable Hive apps

Fixing/Optimizing HAF-based account history app (Hafah)

We’re currently optimizing and testing our first HAF-based app (codenamed Hafah) that emulates the functionality of hived’s account history plugin (and ultimately will replace it). Our initial benchmarks at 5M blocks had good performance, but we saw a slowdown in indexing performance when operating with a fully populated database (i.e. 50M blocks).

Benchmarking time to fill a Hafah database

This week we’ve fixed this slowdown and have some preliminary performance benchmarks for filling up a Hafah database from scratch. As reported previously, a full replay to fill up a HAF database with 57M blocks (i.e. the entire hive blockchain) for Hafah usage takes 5.5 hours. Next, we can run a Hafah index and it creates all its tables in 4.3 hours, taking 5.5 + 4.3 = 9.8 hours for the entire process. This compares very favorably against the time required for a hived account history node to replay: ~17 hours. Also, we haven’t yet tried to run both of these tasks concurrently, but there’s reason to believe that this will allow us to further reduce the time required to fill up a Hafah database.

Benchmarking API performance for Hafah

We also need to benchmark the API performance of a Hafah server. We’ve created a script that uses jmeter to measure how quickly Hafah can process the various account_history API calls under heavy loading conditions. The script currently compares performance of three types of servers: a) direct query to postgres server holding Hafah data, b) json-rpc call to Hafah’s python-based jsonrpc server, and c) a hived node.

Preliminary benchmarks show that the Hafah queries are very fast when served directly from postgres itself, but under loading conditions, we have observed that the python-based jsonrpc server is restricted to one cpu and becomes a bottleneck to performance. It is also worth noting that this is essentially the same code used by hivemind to handle jsonrpc calls, so this bottleneck probably also exists in hivemind, but just wasn’t noticed because the query times for a typical hivemind API call is much longer than the query time for a Hafah API call. In any event, we’ll be investigating ways to eliminate this bottleneck in the coming week, and hopefully it will allow for further scaling of hivemind API performance as well.

Conversion of hivemind to HAF-based app

We’ve completed the first step in converting Hivemind to a HAF-based app (converted hivemind’s massive sync code to use HAF methods). I’ve been told massive sync indexing time is already faster than old-style hivemind, but I don’t have firm numbers yet to report. Also, I expect further improvements as we restructure hivemind’s massive sync procedures to take better advantage of the new way it is being fed data.

Upcoming work for next week

For hived, we’re adding a command-line based option to allow a hived node to wait during a replay if it loses contact with a HAF database that it is filling (this issue arose when one of our devs restarted our HAF postgres server during mid-replay, but it seems like a generally useful feature).

For HAF testing, we’ll be using the hived fork generator to verify that Hafah functions robustly under heavy forking activity on the blockchain. Once we’re further along with Haf-based hivemind, we’ll likely test it the same way.

For Hafah, we’ll be 1) investigating the jsonrpc bottleneck, 2) further benchmarking API performance, 3) verifying results against a hived account history node, 4) benchmarking concurrent hived replay and Hafah massive sync, and 5) setting up continuous integration testing for Hafah.

For Haf-based hivemind, we plan to restructure its massive sync process to simplify and optimize performance by taking advantage of HAF-based design. Next we’ll modify live sync operation to only use HAF data (currently it still makes some calls to hived during live sync).

Sort:  

HIVE!D

Bukowski2.jpg

Well, that was all over my head, but I am studying and one day that will change.
Thanks for the update and expansion.

Yes, a lot of the details of these posts are aimed at other Hive programmers, so most readers are going to find it hard going.

But to summarize it a bit, we're mostly testing/fixing/benchmarking the first HAF-based applications. A lot of work, but so far it's going well. The potential benefits of HAF include 1) lower cost nodes (as we move to the goal of allowing anyone to cheaply operate their own Hive server) that can handle more users and 2) making Hive-based apps faster and easier to develop.

Maybe it's worth adding these few sentences at the beginning of the post 😁

Great work as always 👍

I think people should always read the comments :-)

ADD TLOS DAMNIT

Connect

Trade


@blocktrades! This post has been manually curated by the $PIZZA Token team!

Learn more about $PIZZA Token at hive.pizza. Enjoy a slice of $PIZZA on us!

Congratulations @blocktrades! Your post has been a top performer on the Hive blockchain and you have been rewarded with the following badge:

Post with the highest payout of the day.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out the last post from @hivebuzz:

Hive Power Up Month - Feedback from Day 15

wen next hardfork ? :P

Typically, we try to schedule them 6 months apart.

Once we've got a better idea what features we will be including, we'll figure out a timeline that makes sense for the release. In any event, I expect the release of HAF to be just as significant as the next hardfork, when it comes to benefits to the Hive ecosystem, and we can do that as soon as it is ready.

Here’s my 5 cents: Spammers should pay more for advertising -> negative reputation accounts should pay more RC for comments, something like regular rate x 100.

That's super awesome to hear, since the last hard fork contained some big changes I imagined the next one would be more "tame". But I'm very excited so see new and innovative stuff being implemented :)

Very informative post. Thanks

Hi,
Please report any abuses in our form:
https://hivewatchers.com/reports/new

Are we working on documentation around these work so that it helps onboard new developers to build on hive ?

Here is the documentation for hive fork manager, a part of the HAF system: https://gitlab.syncad.com/hive/psql_tools/-/blob/master/src/hive_fork_manager/Readme.md

Yes, this week we added docs for TestTool (those docs are mostly for Hive core devs). I think there's probably already enough docs to use HAF, but we'll be adding more after we finish up testing the HAF example apps.


The rewards earned on this comment will go directly to the person sharing the post on Twitter as long as they are registered with @poshtoken. Sign up at https://hiveposh.com.

Нихуя не понятно но очень интересно!

Is there a place to propose changes to be included in the next hardfork?

I was thinking it would be a good idea to make the cost of hive accounts and the cost of creating a community variable. I think it's better if it depends on witness consensus similar to HBD interest rates.

Cost of hive accounts is already set by witness consensus.

Congratulations on your great work in developing and updating the platform.

20x better performance. Nice. :D

untitled.gif

I am not familiar with Blocktrades so thanks for all info. Interesting to read about something new.

Old performance: 6.9 hours
New performance: 38 minutes

untitled.gif

What about disk usage? Can you compare the current AH node with hafAH? I'm willing to run a full API node soon and wondering if 2x 2TB will be enough.

Hafah uses around 2TB of disk space, so it definitely uses more space than a hived account history node. But note that most of that is for the HAF tables that can also used for other HAF applications (such as hivemind).