I was asking me, how long does it take to stream the entire steem blockchain to my computer? In order to find out, I wrote the following script:

from beem.blockchain import Blockchain
from beem.nodelist import NodeList
from beem import Steem
from beem.instance import set_shared_steem_instance
import time

if __name__ == "__main__":
    
    # Update current node list from @fullnodeupdate
    nodes = NodeList()
    
    setup = [{"appbase": True, "https": True, "threading": False, "max_batch_size": 50}, {"appbase": True, "https": False, "threading": False, "max_batch_size": 50},
             {"appbase": False, "https": False, "threading": True, "max_batch_size": None}, {"appbase": False, "https": False, "threading": False, "max_batch_size": None},
             {"appbase": True, "https": False, "threading": True, "max_batch_size": None}, {"appbase": True, "https": False, "threading": False, "max_batch_size": None},
             {"appbase": False, "https": True, "threading": True, "max_batch_size": None}, {"appbase": False, "https": True, "threading": False, "max_batch_size": None},
             {"appbase": True, "https": True, "threading": True, "max_batch_size": None}, {"appbase": True, "https": True, "threading": False, "max_batch_size": None}]
    result_setup_days =[]
    start_block_list = [1000, 10e6, 20e6]
    for s in setup:
        print(s)
        nodes.update_nodes(weights={"hist": 1})
        stm = Steem(node=nodes.get_nodes(appbase=s["appbase"], https=s["https"], wss=not s["https"]))
        set_shared_steem_instance(stm)
    
        b = Blockchain()
        end_block = b.get_current_block_num()
        for start_block in start_block_list:
            print(start_block)
            last_block_num = None
            block_diff = 1000
            result_days = []
            
            print("Starting to stream at block %d." % start_block)
            for op in b.stream(start=int(start_block), max_batch_size=s["max_batch_size"], threading=s["threading"], thread_num=8):
                block_num = op["block_num"]
                if last_block_num is None:
                    start_time = time.time()
                    last_block_num = block_num
        
                if (block_num - last_block_num) > block_diff:
                    time_for_blocks = time.time() - start_time
                    print("\n---------------------\n")
                    running_days = (end_block - 1) * time_for_blocks / block_diff / 60 / 60 / 24
                    print("Duration for %d blocks: %.2f s (%.3f s per block) -- %.2f days to go" % (end_block, time_for_blocks, time_for_blocks / block_diff, running_days))
                    start_time = time.time()
                    last_block_num = block_num
                    result_days.append(running_days)
                    break
        result_setup_days.append(result_days)

I'm measuring for each setup three times 1000 blocks and I calculate based on this measurement, how long it would take.

Setups

https with 0.19.10 nodes, max_batch_size=50
wss with 0.19.10 nodes, max_batch_size=50
wss with 0.19.5 nodes, 8 threads
wss with 0.19.5 nodes
wss with 0.19.10 nodes, 8 threads
wss with 0.19.10 nodes
https with 0.19.5 nodes, 8 threads
https with 0.19.5 nodes
https with 0.19.10 nodes, 8 threads
https with 0.19.10 nodes

Results

This table should the mean duration calculated from the three block streaming durations.

#	appbase	https	threading	max_batch_size	days
1	True	True	False	50	1.837
2	True	False	False	50	-
3	False	False	True	None	4.973
4	False	False	False	None	11.583
5	True	False	True	None	7.54
6	True	False	False	None	11.99
7	False	True	True	None	193.966
8	False	True	False	None	433.633
9	True	True	True	None	14.67
10	True	True	False	None	85.63

In this table, the estimated duration to stream the entire blockchain was calculated based on the time duration to stream 1000 blocks starting at the given block numer.

#	Block: 1000	Block: 10000000	Block: 20000000
1	0.96	1.32	3.23
2	-	-	-
3	3.36	1.74	9.82
4	7.96	8.65	18.14
5	3.67	1.52	9.89
6	9.27	8.90	17.80
7	276.56	145.90	159.44
8	434.91	430.40	435.59
9	31.49	5.82	6.72
10	84.87	85.05	86.98

Conclusions

The fastest way to stream the entire blockchain with beem is using batched calls on a 0.19.10 https node. It takes then only around 2 days. The result could be improved even more, by adding also threading to batched block streaming.

Using https nodes without batch calls is not recommended, it takes at least 14.6 days (0.19.10 node with threading).

Using Websocket nodes is also an alternative, using threading it takes 5 days on a 0.19.5 node and 7 days on a 0.19.10 node.

Batch calls on a 0.19.10 Websocket node did not work, as fetching 1000 blocks did take more than several hours. I aborted this test run for this configuration.

Streaming 1000 blocks around block 10 million takes less time and streaming around block 20 million takes longer. The reason for this is the increased block size over time. Sometimes, streaming blocks starting from 1000 takes longer, this could be caused by the fact that these old blocks were not cached in the nodes.

Sort:

Trending

[-]

timcliff (73) 7 years ago

Not sure if it is outside the scope of your experiment, but there is also an option to just download the full block_log file from one of these sites:

https://gtg.steem.house/get/blockchain/block_log
https://seed.steemian.info/get/blockchain/block_log
https://rpc-upstream3.steem.house/get/block_log
https://s3.amazonaws.com/steemit-dev-blockchainstate/block_log-latest

$0.09

2 votes

holger80 (74) 7 years ago

That would be a nice python project, to read blocks from block_log....

$0.08

crokkon (69) 7 years ago (edited)

$0.06

1 vote

There was only one 0.19.10 Node with Websocket support available, which had some performance issues during the test.

$0.03

isnochys (67) 7 years ago

It would be nice to have a streaming node only.
It cannot do anything more but give you blocks.
No account info, no posts, no nothing.
Only streaming blocks..

$0.00

irelandscape (62) 7 years ago

Interesting. Thanks for sharing.

harpagon (63) 7 years ago (edited)

Very interesting results, thanks!

How long does it take to stream the entire steem blockchain?

Setups

Results

Conclusions