A Day in the Life of a Steem API

in #steem5 years ago (edited)


cloud-computing-1989339_1280.png

While many people simply use Steemit.com to use Steem, this isn't the full story. You might not know it, but every single time you interact with the Steem blockchain, you need to go through an API node -- this is how your application retrieves and submits data.

While Steemit provides their own public API servers, in an effort of reducing this reliance on a single point of failure, my own publicly provided service at https://anyx.io is a high-performance alternative to Steemit servers. Over the past few months it has proven itself a worthy substitute, and is the default choice for many applications such as Busy, Partiko, Steemworld, and more.

Over the past month, the API has been set to collect some aggregated statistics over how it is being used. A large part of the goal is to drive research into what API calls are being made and how often, to help further optimizations to the API. However, I found the whole data itself to be pretty interesting and worth a share.

A Note on Privacy

When people hear "data collection", it is often equated with bad practices, mishandling, and so on. And since the charts below are providing some aggregated statistics, it does mean some data collection.

To hopefully ease your mind if you're concerned about privacy, the data collected is only aggregated stats, e.g. sums. The data that generates what you will see below is aggregated over a month, yet logs are purged every day; this means the data couldn't even be regenerated if one wanted to. In addition, the only data used is what is provided by the requestor (and could even be spoofed), and thus must be taken with a grain of salt. No data of anything personally identifiable is kept (nor is there any, honestly -- the apps themselves are more likely to see personally identifiable data). An API only sees at a high level what requests are being made to it, and if the app provides it, which app made the request on someones behalf.



Hint: click a chart to view the interactable version! Steemit doesn't support embedding the interactable one, but they are cool to look at.

Average Daily Usage

The first interesting metric to present is "how well used is the API". For that, let's investigate the number of requests made in a given day, as well as the bandwidth (total data transferred), and how many unique visitors made requests.



Investigating these metrics of daily usage over a snapshot week shows a fairly even distribution of unique visitors, bandwidth, and API requests over time. The average requests per day is approximately 39 million (that is about 450 requests per second), with an average daily usage of 470 GB of bandwidth, and on average 52,000 new unique visitors per day.

While not plotted, another interesting metric is that during the past month of observing API usage, the API serviced over one million unique visitors (1,195,422 to be precise)!

Usage by Region

As requests come from an IP address, this can be approximately geo-located into which country (likely) is responsible for that IP. With this, one can estimate where traffic is coming from at a high level. This data is not always accurate and is easily circumvented with tools like VPNs, but it is an interesting metric to see regardless.

Bandwidth (MiB)Unique Visitors

Looking at our usage by region, there is a really interesting difference between Asia and North America. While America leads in traffic and bandwidth, Asia leads in unique visitors. While some may jump to conclusions why this might be the case (bots), as we will show in usage by app, one of the leaders in usage is steemcn -- with this website alone hitting 150k unique users in the past month. Indeed, as Steemit and their servers are firewalled in China, Steemit.com will not work for the Chinese, but decentralization uh, finds a way, and it seems a large portal to the Steem blockchain from China is through this API.

A possible explanation as to why the US is traffic and bandwidth heavy is that there are many services that run off AWS. I've noticed that there are high volume traffic coming from servers with requests like fetching every block, and users renting for the US market will likely provision their servers in the US. As this API is provided from Canada, US traffic achieves very low latency and supports those services quite well.

Usage by App

During an API request, especially if it is a web application (which most are), the application will make API requests "on your behalf". This is for example how a site like Busy or Steemworld gets its data to present to you -- when you load the webpage, your web browser makes relevant API requests to fetch the right data. During this process, the app can optionally add a little note letting the API know which app made the request. This is known as the referrer. In the following chart, there are some interesting metrics for those apps which are providing referrer data.



Not all apps provide the referrer data, so this chart is actually far from complete. For example, the highest on this chart at about 5 million requests per day is Steemworld, however they only represent about 13% of total request traffic. Many apps, including big ones like Partiko which use the API, may opt to not share the referrer source for various reasons (the most likely simply not bothering to set it). In addition, as this data is self-provided, take the results with a grain of salt (as it may not be accurate).

While viewing the chart, note that the Y-Axis is a log scale. This is because the app breakdown has quite a power-law distribution! Steemworld is over 10x the requests per day of Busy (as their scripts are quite heavy fetching various account data), and Busy is about 3x the next on the chart, Steemhunt, and so on.

API Request Breakdown

Finally, the raison d'être of this post. This is the most interesting metric (for me at least, as the one serving the requests) as it really shows exactly how the API is being used. This breakdown -- a sample of a single day's worth of activity -- shows what API data is being requested, and is thus an important indicator of which API calls should be optimized for future performance.



Unsurprising to me, get_block and get_dynamic_global_properties are leading with almost half the traffic together. This is not overly surprising as many services can use these two requests together to "watch" the chain, building their own internal state. Indeed, with just these two requests, you can basically generate any other data yourself -- it's just a matter of how much data you want to store within your service.

The next dominant result is get_content: this is typically what is called from a service like Busy or Steempeak whenever you view a post. get_accounts is likely dominantly used by services like Steemworld, fetching your account data.

An interesting one included here is OPTIONS: while not actually an API call, this is a request sent to the API that is done by web browsers, usually implicitly, so that the web browser sort of understands what it's talking to when it will talk to the API. In this case, you can kind of think of this as a good representation of "Hello, nice to meet you" from a web browser when it first talks to the API from a web page like Busy or Steempeak.

Account history requests (get_account history, get_ops_in_block and get_transaction (which is hardly visible)) are actually not that dominant, though they are still bandwidth heavy. As hivemind requests follow a similar story, it really highlights how not all API usage has "full node" requirements.

Please Support the API!

In my proposal post here, I outlined the details for my proposal to fund this public API infrastructure. The summary is as follows:

This proposal seeks to help reimburse the recurrent costs of the public, free-to-use anyx.io Steem API infrastructure. If you use sites or services such as Busy.org, Splinterlands (SteemMonsters), Partiko, or many more — these services rely on this infrastructure for both uptime and performance.

Please do consider voting for this proposal to help it achieve funding and keep it alive, at one of the links below. It targets 15 SBD per day from the Steem DAO to cover hosting costs, and it's one of the lowest costing proposals. It only takes a second to vote for a proposal, and it does not cost you anything to vote.

https://steemitwallet.com/proposals
https://steempeak.com/proposals

Sort:  

Now this are stats you don't see everyday. Thanks for sharing..

Posted using Partiko Android

Really interesting stats.

I guess it would be interesting to perform a similar analysis on SteemSQL

Screenshot_20191029-181410~2.png

Could you please help me, how many ip adress
are here in El Salvador because i am alone here and always i asking me are more steemians users, with this information i could make any idea about the others users near here.
Thanks a lot for this great information.

You can see unique visitors in the other chart! However, note that the data is not perfect. It might not be accurate, and also, it might also be just people that have clicked on a link to a steem website just once -- I don't collect any kind of "personal activity" data so I would have no idea of knowing more.

@anyx i have vote for you to do my best to help out with the current status quo. now after reading why the number 4 is been put in the witness numbers i'm starting to doubting my decision. You have put yours slightly different to others though. some clarification / honesty please. Will just change my vote to someone else if req.

I had initially unvoted all witnesses as i would have done the same as Justin S. now it seems a great place to start:)

Thanks for the vote. I'm not fully sure what all the 4's are about -- I think because we lasted for 4 years with Steem? Or it means "death" of Steem?

I picked 404 as it means "not found". It was a sad day to see the hostile takeover, so I felt like protesting with "decentralization not found".

I'm currently not running any code differences other than that number, but we'll see how the community decides moving forward.

After reading this, I felt like your proposal must be supported and I voted for it ( in fact my first vote for a proposal). Good to see you maintaining this high volume and helping steem grow.

Supported your SPS. Don't know steemcn has that many visitors :D
Thanks for your api, it really helps users from mainland China.


This post was shared in the Curation Collective Discord community for curators, and upvoted and resteemed by the @c-squared community account after manual review.
@c-squared runs a community witness. Please consider using one of your witness votes on us here

Your post has been curated by the bitcoin myk project. Tokens are available for this account you can trade for steem at: https://steem-engine.com/. Join our curation priority list to earn more tokens by registering at:

http://www.bitcoinmyk.com/register/

Bitcoin MYK
admin
Register - Bitcoin MYK

Very in stuff. Thanks for providing this service. I select your node in Steemnow as it gives better data than some others that get duplicates. Will there be issues maintaining it if Steem scales up to millions of users? There need to be ways to cover the costs as that happens.

Congratulations @anyx! You received a personal award!

SteemFest 4 Attendee

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Do not miss the last post from @steemitboard:

SteemFest⁴ - Meet the Steemians Contest
Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Hi! @anyx your work sound a bit foreign to me as it totally like another foreign planet for me but I do understand how important your API is from this article. Your proposal really need more support from steemian. I personally use most of the dapps you mentioned daily. I have supported your proposal and will resteem this article on my profile to get more exposure. Thank you for your hard work and keep doing your great work for steem.

Congratulations @anyx! You received a personal award!

Thank you for the witness votes you made to support your Steem community and for keeping the Steem blockchain decentralized

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Do not miss the last post from @steemitboard:

Use your witness votes and get the Community Badge
Vote for @Steemitboard as a witness to get one more award and increased upvotes!

Hello
I want to use http://anyx.io and get the operational information that was done from 2016 to 2018 and recorded in the blocks. Would you please show me how I can get this information with very little delay?