You are viewing a single comment's thread from:

RE: [GUIDE] Optimise your RPC server for better performance

in #witness-category3 years ago

Upvoting for the effort of making things better, but I can't agree with tips. Name lookup while being a part of a benchmark is crazy long and irrelevant. Nginx config is kind of generic use case optimization. We are using it for reverse proxies, so sendfile on is irrelevant.
Network level optimizations is what devices in front of your rpc node are used for.

time_namelookup:    0.004
time_connect:       0.004
time_appconnect:    0.087
time_pretransfer:   0.087
time_redirect:      0.000
time_starttransfer: 0.088
time_total:         0.088

That's my endpoint under load, but queried from a VIP zone. No TCP tweaking on a host. 128GB RAM with RAID0 NVMe

BTW, yeah, I know that my node's performance for general public is currently not as good as it used to be but I'm still trying to serve some high frequency requests to service providers (despite multiple notices of deprecation / making that endpoint obsolete).
I will switch endpoint somewhere in May to what I have currently under tests. Same hardware, new software, you will see the difference :-)


The name lookup part is strange yes, but it was staying between 400ms to 600ms regardless of that until I made the optimisations.

I found one of the most common reasons for RPCs being slow, was too many connections. I saw some IPs making 100s of connections regardless of the nginx rate limiting. I also saw an ungodly amount of TIME_WAIT (waiting to close) connections that were not being cleaned up.

I copied the sendfile on part from our Privex config, I was a little confused as to why that was there too, but I just left it there as it didn't seem to be hurting anything.

If you take a look at the graphs, you can see the insane open connections that my RPC node, and the minnowsupport one were suffering from. This is partly due to asshats using get_block over HTTP rather than websockets, thus opening 100s of connections (of which by default linux takes 4 minutes to close... which is why time_wait optimisation is needed). This does slow down public RPC nodes due to the fact the network scheduler is having to deal with 2000 connections despite the fact less than 300 of those are actually active.

Yeah... I was considering disabling get_block entirely and using separate, smaller and much more robust instance for that (pre-jussi times) but there were also troubles with vops. I'm planning for improvements for June, there's no point in wasting time for temporary solutions.

Hi will graphql at least save those poor rpc nodes from too much request? maybe websocket + graphql? Even facebook are serving clients using graphql. It's way better than REST api.

Well, yes, using GraphQL would be a perfect way for interacting with various microservices, moreover it can live alongside standard REST routes.
Oh, by the way June turned out to be July or even August ;-( Time flies.

well yeah :) time flies. I learned a lot from coding.. Dude you should totally have graphql setted up :P So i can query :D I'm not technical enough to setup one. Do i need to be witness? I thought of relaying rpc nodes then to graphql server. But that's just redundant no? so better be a witness?

No need to be a witness to run your own API endpoints, however, due to the fact that witnesses are compensated for block production, they are expected to provide infrastructure / services for the community.
If you have a good idea for Steem related service that need to run it's own high performance API server I can help you with setting that up.

Well right now I'm having a quick fix by having all rpc nodes in json file. And rotate in case failure. Should be good for now. If I come up with other project or my app does scale. I'll remember your offer ;)

This is partly due to asshats using get_block over HTTP rather than websockets

@someguy123 As a witnesses you have strong knowledgebase about stuff working behind the scene, so perhaps you have some preferences / advices aimed to API users?

I mean people using API do not worry about performance, but perhaps your hints (such as this quoted above) make a difference? Thinking about simple Do's and Do Not's list.

Agreed. For people using an api, what would be helpful?

Upvoted myself because those comments from voting bots are super annoying. Nobody is scrolling below that trash.

So what's the tl;dr?

hardware firewall in front to deal with network level hassle, nginx with ssl termination, jussi for caching, then specialized nodes; appbase+rocksdb, enterprise nvmes and 640kB should be enough for everybody ;-)
Soon in your blockchain (June, after my vacation)

Thanks. Unfortunately unable to upvote at this moment.