5th update of 2024: Lite “hived” nodes, 2nd layer plans, and lots more (accidental repost, don't vote)

in HiveDevs25 days ago (edited)

5th update of 2024: Lite “hived” nodes, 2nd layer plans, and lots more

blocktrades update.png

Below are a few highlights of the Hive-related programming issues worked on by the BlockTrades team since my last report.

Hived: core blockchain node software

Split block logs (enables “lite” hived nodes for services)

We are adding support for “lite” hived nodes that don’t retain the entire blockchain. Most of the work is done at this point and the remaining portion will likely be completed in the next week or so.

Ideally, hived nodes should retain the entire blockchain history, as this provides more redundant storage to any node that is not yet in sync with the current Hive head block, allowing such nodes to get the old blocks they don’t have from their peers.

But at some point, such storage becomes overkill, and it also increases the costs to run services that require a hived node becomes of the associated storage costs. Currently, even with the new features for compressing block storage that cuts storage costs in half, the block_log file is 486GB in size (almost ½ a terabyte).

Introducing the block-log-split option

To enable the operation of hived nodes that need to operate with less storage, there’s a new command-line option to hived called block-log-split that enables a node to be operate with fewer blocks retained. By default, this value is set to -1, which means to operate in the standard way (with a single, full-size block log). Setting this value to 0 tells hived to not maintain any block history (no block_log file).

Setting the block-log-split option to any larger value tells hived to maintain at least that many million recent blocks. For example, setting `block-log-split=2’ means keep at least the last 2 million blocks. In this mode, the block log is stored in multiple files (e.g. block_log_part.0084 contains blocks up to 84M, block_log_part.0085 contains blocks up to block 85M, and the “top” file block_log_part.0086 contains blocks from 85,000,001 to the current head block), each 1 million blocks long and split at 1 million byte boundaries (except for the currently “top” file which will only contain blocks up to the current head block).

In split block mode, only the “top” file will be written to. The other files will only be read from when the node is supplying blocks to other peer nodes that need to sync to the head block.

If you want your node to maintain the entire block chain, but store the blocks in split mode format (instead of one big file), you can simply set block-log-split to a large value like 10000. In this case, at the current time, you would have 86 block_log_part files, starting with block_log_part.0001. Note that for each block_log_part file, there is also an associated artifacts file (e.g. block_log_part.0001.artifacts).

Switching an existing node to split log mode

We’re also making it easy to switch a currently operating node to split block_log mode without requiring a resync of the entire blockchain. Now all you will have to do is shutdown your node, change to the desired number of block_log parts to maintain, and restart your node. Your node will read your existing full-size block log, split it as necessary, then sync to get any blocks that were missed while it was shutdown. After the nodes has finished splitting the block log, you can delete the original full-size block log (or move it to slower storage for backup purposes). Work on this issue is being tracked here: https://gitlab.syncad.com/hive/hive/-/issues/686

If you are especially space-constrained, you can start by putting the full-size block_log on a different, potentially slower, storage device, then create a symbolic link to it in your hived data directory. For node operators who plan to replay often (e.g. developers testing new versions of hived or HAF), this is probably a handy configuration, since such nodes will still need all the early blocks to perform replays.

Other hived changes

Beekeeper (cryptographic key management software)

Beekeeper now has buffer encryption. This was needed to support Wax in Clive (Clive is the new console-based wallet) and other frontends (e.g. Denser, the upcoming replacement for Condenser).

There were also various bugfixes and API changes in beekeeper to improve security:

Python Beekeeper (a python wrapper for BeeKeeper)

  • Implementation of an object-specific interface to easily perform beekeeper actions. This is the first step needed to create an object-based Wax implementation for Python as was done in the Typescript wrapper.
  • API call performance optimizations

HAF API node

Improved haf_api_node dataset structure

The haf_api_node directory structure has been changed to store the shared_memory file in a separate dataset from the blockchain. This offers a couple of benefits: 1) block_logs can be stored on a slower storage device, 2) you can set different ZFS properties such as compression level (block_logs are already compressed so no point in compressing them, but state file and WAL files may benefit from compression), and 3) you can re-use the current blockchain snapshot when you upgrade to new versions of hived that have incompatible shared memory formats (similarly, you can do a shallow clone of just the blockchain directory to run two hived instances on the same system with reduced storage, something which we are using it for now on our own servers).
When you pull the latest changes, your existing stack should still work, but it will not follow the new suggested layout. To transform your existing dataset to the new layout:
• docker compose down
• git pull
• edit your environment to use HIVE_API_NODE_VERSION=1.27.5 and set HAF_SHM_DIRECTORY="${TOP_LEVEL_DATASET_MOUNTPOINT}/shared_memory"
• sudo zfs create -o atime=off -o compression=off haf-pool/haf-datadir/shared_memory
• sudo chown 1000:100 /haf-pool/haf-datadir/shared_memory
• sudo mv /haf-pool/haf-datadir/blockchain/shared_memory.bin /haf-pool/haf-datadir/blockchain/haf_wal /haf-pool/haf-datadir/shared_memory
• docker compose up -d

Added documentation on how to compress API responses:

Most of the data the Hive API serves up compresses well. Calls like get_block()
generate a lot of data, and will typically compress 3x or better. You can
decrease your bandwidth (and your user's bandwidth) by enabling compression,
at the expense of higher CPU usage on your server. To do this, drop code
like this in a file called, say, compression.snippet:

encode {
  zstd
  gzip
  minimum_length 1024
}

HAF (Hive Application Framework)

Documenting HAF's new REST-based API (transitioning away from JSON-based API)

We’re created a methodology for documenting the new REST-based APIs for Hive in order to keep the documentation and the actual APIs synchronized so that the documentation doesn’t become out-of-date as changes are made over time.

Under this methodology, both the OpenAPI docs and the top-level API functions written in SQL are stored in the same file. The OpenAPI documentation is used to create Swagger-based interactive documentation that can be directly hosted on Hive API nodes in a docker container.

An API developer will first write OpenAPI function specifications for the API calls they plan to create, then run a new tool which processes the OpenAPI function specification into a SQL function prototype. The tool can also be run in-place on a file containing existing SQL API functions to update the function signatures whenever the specification of an API call needs to be changed.

The OpenAPI specification can also be used to create rewrite rules for caddy/nginx to simplify the creation of more “standard” REST APIs. Some work still needs to be done to figure out how the rewrite rules will be updated in the rewrite processing engine container (e.g. caddy/nginx/varnish) when a new API container is launched where the specs have change.

More information about the new API documentation process can be found here: https://gitlab.syncad.com/hive/haf_block_explorer/-/merge_requests/178

Miscellaneous improvements to HAF

2nd layer “Lite accounts” that are transportable across the Hive ecosystem

We are creating a new HAF app to manage the creation and maintenance of “Lite” accounts that can be used by any 2nd layer app to sign 2nd layer transactions. The specification for this also includes documentation for how we plan to support 2nd layer transactions.

This is a fairly complex topic so aspects of the design are still underway and I’ll have much more to say about this in future reports, but for anyone interested in creating 2nd layer apps that require users to generate custom_json operations, I recommend reading the following link for more details on how the design is developing so far: https://gitlab.syncad.com/hive/haf/-/issues/214.

Hivemind API (social media API)

HAF Block Explorer and Balance Tracker APIs

Block explorer UI:

  • Filter dialog improvements
  • Block search result page allows op-type filtering (also URL has embedded filter)
  • Bugfixes specific to time/date display and UI tweaks

wax (New multi-language Hive API library)

  • Typescript-based transaction builder now supports encryption in operations: transfer (encrypted memo), comment, custom_json (only internal json part is encrypted), transfer from/to savings . For custom json encryption, the original json contents is encrypted and wrapped into a sub-object with the key name “encrypted” and a string value like #xxxxx (the same format used for encrypted transfer memos). In this way we can recognize if a given custom json will require decrypting during processing. An example of an encrypted transfer can be found here: https://gitlab.syncad.com/hive/wax/-/blob/develop/wasm/tests/detailed/hive_base.ts?ref_type=heads#L240. You can use at most 2 public keys to perform the encryption (e.g. for the sender and intended receiver). For example, if you were sending “secret” commands to some game app that other players shouldn’t be able to read, you could encrypt the commands so that only the app can read them.
  • Bugfixes to protobuf serialization
  • Transaction builder interface improved to make it more intuitive
  • New tests and Playwright test fixture improvements

Wax work in progress:

  • API call health-checker component for apps to aid users in endpoint URL selection.
  • Preprequisite steps for publication at official npm registry: https://registry.npmjs.org (Important note: package scope has changed from hive to hiveio).
  • Benchmarks to verify library performance after its size optimizations (to be done)

Clive: command-line and TUI wallet for Hive

New version v1.27.5.10 released: https://gitlab.syncad.com/hive/clive/-/merge_requests/361 (New features detailed in the link)

Some of our work in progress (or planned for near future)

  • Creation of HAF-based Lite Accounts application (implementation in progress)
  • Developing spec for 2nd layer smart contract processing engine (most of docs are here: https://gitlab.syncad.com/hive/smarc/-/tree/abw_smarc_design/doc?ref_type=heads )
  • Many hived improvements: https://gitlab.syncad.com/hive/hive/-/issues/675
  • Official release of wax and workerbee npm package
  • Finish new OpenAPI documentation of existing REST APIs (in particular block_explorer and balance_tracker)
  • Create a release candidate for Denser (replacement for Condenser)
  • Continue adding new commands to Clive
  • Release new reputation tracker app
  • More hivemind performance improvements (continued replacement of Python code)
  • Integrate reputation_tracker as a sub-app inside hivemind. This should improve space optimization (now hivemind must collect all votes to recalculate reputation) and sync speed.
  • Eliminate timestamp in HAF operations table to reduce database size
  • Redesign HAF main loop to make it less error-prone
Sort:  

The approach to lite nodes seems pretty nice and low-effort. And opens up more possibilities for different workflows and so on. Does not having the whole blog_log stored also mean that there can be a reduction in the required RAM, by any chance? Or is the RAM truly required just for storing the current state? I'm surprised how much size it requires to store the current state.

Even more reduction? Have mercy ;-) It already is 3x less than it used to be (60-ish -> 20-ish of GB) in shared_memory.bin size.
And no, block_log isn't kept in memory, it's on disk. But it makes a huge difference in storage requirement for example for a simple broadcaster node.

Oh, this is all extremely good work, I am very grateful you have done this much reduction. :) I was just wondering about that one and looks like blocktrades answered.

Normal nodes have no need to store the 28GB "shared_memory" file in memory now. All our nodes store it on SSD storage (which is actually advantageous in some ways since the storage is persistent that way). In practice, a typical node uses around 4GB of resident memory nowadays.

Congratulations @blocktrades! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You distributed more than 170000 upvotes.
Your next target is to reach 175000 upvotes.

You can view your badges on your board and compare yourself to others in the Ranking
If you no longer want to receive notifications, reply to this comment with the word STOP

Check out our last posts:

Hive Power Up Month Challenge - May 2024 Winners List
Be ready for the June edition of the Hive Power Up Month!
Hive Power Up Day - June 1st 2024

Es un trabajo exelente, no comprendo mucho de la parte técnica pero pienso que se trata de ahorrar espacio y ser más eficiente