how to scrape data from hive UIs in python?

in Programming & Dev3 years ago

I'm trying with

import urllib.request

get_data = urllib.request.urlopen('https://leofinance.io/@leofinance/introducing-cub-finance-or-leofinance-expands-into-defi-on-the-binance-smart-chain')

post = get_data.read()

but I only get tons of errors... probably has something to do with how data is shown here, but I can't seem to figure it out.

anyone?

Sort:  

OMG, why would you do that this way?
If you need data, just ask API node for it:
https://developers.hive.io/tutorials-python/get_post_details.html

because I want to use the code for medium and other sources, which work all well except hive :)

Hive is like no other. Doh.

this is what I got so far, but I can't seem to figure out how to set the correct author and permlink from the list.

yes, I'm learning python :)

path = "leofinance.io/@ash/bitcoin-experiment-financial-freedom-with-faucets-week-39"
path_list = path.split("/")
print(path_list)
if path_list[0] == "https:":
del path_list[0:3]
else:
del path_list[0]
details = Comment(authorperm["author", path_list[0]], authorperm["permlink", path_list[1]])
print(details)

I ran it and got a 403. Leo probably blocks off sources with malformed headers. Just match the headers to be the way they send it.

yep, I added fake-useragent and it progressed to "please enable java", which means I'm out of options. need to run a headless chrome or whatevs to get this running, but since I use the script for other services as well, it's not really worth it at the moment.

Alternatively I need to scrape HIVE from the API

Follow gtgs advice here. You’ll end up so much better off pulling data right from a hive node.

Nice presentation