Getting the commenter list of a root post. How hard can it be?

in lighthive10 months ago (edited)

Lighthive documentation

Today's requirement was getting the commenter list of a Hive Comment object. get_content RPC call has a field named children which indicate the comments made under the related Comment object.

Then, you have get_content_replies RPC call, which gives you the first level of comments of the given author/permlink.

So, basically, I can find out the all commenters of a post with this logic:

from lighthive.client import Client

def get_commenters(author, permlink, c=None):
    commenters = set()
    client = c or Client()
    replies = client.get_content_replies(author, permlink)
    if len(replies) == 0:
        return 0

    for reply in replies:
        if reply["children"] > 0:
            # only send a new request and fetch sub comments
            # if there is a child comment
            commenters = commenters | get_commenters(
                reply["author"], reply["permlink"], c=client)
    return commenters

The code looks great and works like a charm. Naaah, not really. It requires a lot of HTTP calls. They're not cheap. They directly affect the calculation time, they block the process, they slow down our app, they stress the public RPC nodes.

At this point, I remembered get_state call, but also remembered that it was/will be deprecated. I was about to handle this with an SQL query in HIVE, but I also thought that and peakd shows all comment tree just fine in a matter of seconds, so I've had a peak how PeakdNo pun intended gets the comment tree for each post.

BAM! It was there. The endpoint I miss was the "bridge_api.get_discussion" which includes all child comments of a given comment.

Screen Shot 2020-05-13 at 01.03.44.png

Debugger screen of a typical get_discussion response.

Let's update code to use this endpoint:

def get_commenters(author, permlink, c=None):
    client = c or Client()
    discussion = client('bridge').get_discussion({"author": author, "permlink": permlink})
    commenters = [identifier.split("/")[0] for identifier in list(discussion.keys())[1:]]
    return list(set(commenters))

which just sends "one" request no matter what. Time complexity wise we can say it's the good and old O(1). A quick benchmark puts the difference very well. For the same Comment object, results of each function:

Commenters of emrebeyler/the-battle-of-the-hive-interfaces-apps {'emrebeyler', 'hugo1954', 'chekohler', 'moeknows', 'oneshot', 'codingdefined', 'bashadow', 'guruvaj', 'pcste', 'deathwing', 'steevc', 'reazuliqbal', 'pouchon', 'wandrnrose7', 'leprechaun', 'marki99', 'zaibkang', 'arynews196', 'denmarkguy', 'steemitri', 'peerzadazeeshan', 'bluerobo'}
It took 9.728495836257935 seconds
Commenters of emrebeyler/the-battle-of-the-hive-interfaces-apps ['emrebeyler', 'hugo1954', 'moeknows', 'bashadow', 'codingdefined', 'guruvaj', 'pcste', 'pouchon', 'leprechaun', 'steemitri', 'bluerobo', 'chekohler', 'oneshot', 'deathwing', 'steevc', 'reazuliqbal', 'wandrnrose7', 'marki99', 'zaibkang', 'denmarkguy', 'peerzadazeeshan']
It took 0.8311231136322021 seconds

Can’t remember for sure but I think this is a pain in hiveSQL as the Comments table needs joining back on itself.

That looks much faster!

Yes, to be honest, I'm not sure how to fetch the data with a SQL query. It might need recursive queries. Hivemind also uses multiple queries and processing at Python side to solve the problem.

Yes I knew this endpoint, it was much faster than going through the loop under children.

well, I wish that I was aware of this earlier. :)

This is an interesting, short, and clear tutorial to read. We really appreciate your work and would like to feature this post in our Gitplait-elite publication. Kudos!