[STEEMSQL.COM] A public SQL server database with all Steemit blockchain data

in #steemit8 years ago

Did you ever wanted to easily access data contained in the Steem blockchain and perform analysis or find valuable information. But not everybody has programming skills to gather those data and compute the wanted result.

Therefore, I created a publicly available SQL database with all the blockchain data in it.

Why use a SQL database?

The main advantage having such a database is the fact data are structured and easily accessible from any application able to connect to a SQL Server database. Having a SQL Server database makes it possible to produce quick answers to queries.

Simply put, a query is a question. You ask the server form something and it sends back an answer (called the query result set).
For example, when dealing with large amounts of data as Steem blockchain data, you might want quick answers to questions (queries) such as:

  • What was the Steem power down volume during the past six weeks?
  • Which are the top 10 most rewarded post ever?
  • Did I get, me or my posts, mentioned in any post or comment?
  • How many posts are talking about ants?

Browsing the blockchain over and over to retrieve and compute such information is time and resource consuming.
If you don’t have a local copy of the blockchain, instead of downloading the whole data from some external public node to process it, you will send your query to SteemSQL server and get only the requested information, saving tons of bandwidth.

Let’s have a look at some technical details

Database diagram

The Blocks table contains bare block information (timestamp, witness …)
Each block can contains Transactions
Depending on each transaction type, the associated transaction’s data is stored in the related table.

Full text search

The database has been full text search enabled. This allow fast search of information within post and replies.

Let say I want to know if anyone mentioned me in a post or comment, the following simple query will do the trick

SELECT author, title, body, url FROM TxComments WHERE CONTAINS(body, '@arcange')

Database Connection information:

Here the information to connect and query the database:

Server: sql.steemsql.com
User: steemit
Password: steemit

Database name: DBSteem

How to retrieve query the database

  • Using Microsoft Excel

Check this tutorial to see how to create an analysis report with Excel

  • If you’re a python programmer
import pypyodbc
connection = pypyodbc.connect('Driver={SQL Server};'
                                'Server=sql.steemsql.com;'
                                'Database=DBSteem;'
                                'uid=steemit;pwd=steemit')
cursor = connection.cursor() 
SQLCommand = ("YOUR SQL QUERY”)
cursor.execute(SQLCommand)
# do whatever you want with the retrieved data
connection.close()

Support

If you need help, have any comment or request, join steemsql channel on steemit.chat

Availability and performance

The SQL server is hosted in a datacenter with 24/7/365 availability.
Available output bandwidth is up to 500Mb/s

New data from the latest blocks are injected in the database every 10 seconds.

The server is currently hosted in a shared infrastructure.
I will monitor server load and if it requires more resources, I will allocate any reward to this post to a dedicated infrastructure.

You like it, please upvote or follow me

Sort:  

the steemit account looks suspended, the service is no longer available?

it has moved to a monthly subscription model.

hi @arcange,
Is this database still working ?
If yes Then I am not able to connect to server using Sql server management studio.

I am able to connect just fine.

The database here is not responding!! I'm trying to connect to it from Java code and it says "Communication link failure". Also from comand line I run a ping and none of the packets come back!

Why MS SQL server, and not something more open like MySQL or PostgreSQL?

The fact the server behind is open or not does not matter to me.

The server itself is not open, only the data in it.
What's important is to have the required skills to manage it (including its security)and have it available to users's requests.

You're free to build your own if you really want an open source server.

How did you populate the database to begin with? If you're using the RPC mechanism, after 30 days some data is no longer returned. For example, active_votes is empty for all posts after their second payout. You can even see this on steemit.com where old posts have zero upvotes.

All votes are in transactions and stored in the TxVotes table, even first votes ever :

@arcange

this is severely underrated. gave you my full blown 100%. we need this.

Thank you !

the differences between the graph database and the monolithinc, intransient relational database takes some compilation of data, and new blocks alter records, in ways that can be difficult to adapt, such as if a poster makes a really big edit, and suddenly the datatype has been overflowed.

But I think this is a cool idea. It might make a model for moving from a memory heavy graphene database to a storage heavy sql, maybe some sort of hybrid to cache data for faster retrieval or so.

SteemSQL database is a replica of the whole blockchain, not a compilation!
This mean if a user edit a post, you will find 2 or more transactions in TxComments with the same author/permlink but with different bodies/titles.
Compilation of data is made at query time.

Ah. So it's just another implementation of the graph database protocol, layered on top of a relational database.

This isn't a graph database at all, unless I am somehow mistaken.

No, it however has to implement one to allow the diff-updates that the platform allows, which essentially is a graph database (like CVS or Git). It just gives you the ability to search using SQL queries which are very concise and neat and easy to write, and with that pull all kinds of higher level data out of it. I think that was the purpose of building it - to allow more people to tinker with analytics.

Seriously, Doesn't this defeat the purpose of decentralization?

Why its defeat decentralization?

No. The essence of data analysis is collecting (aggregating ) data and making meaningful insights from it, which requires centralization by rule. The decentralized nature of the data isn't in jeopardy here because there are still unlimited copies of the Steemit blockchain out there ensuring that there is no vulnerable, mutable, single-source of the truth. This particular copy just happens to facilitate reporting.

As soon as I have some time I will use your steemit database because I am a data fanatic. Thank you for making this available. I am following you now.

Thanks!
I will closely monitor server as I made it a bit quick and dirty. I will now work on tweaking it for best performances

Thankyou heaps for doing this!
I have a heap of ideas for queries but I've been trying to avoid the overhead of running my own full node - this is a far more efficient way to do it :)
I'll be following, great work

That's exactly the purpose of it: having an efficient alternative to local nodes.
Thanks!

That's nice of you to make this available. hopefully you won't get flooded with requests that the resources requirements will require too much of an overhead :)

That could be, as for every service made public. But worth to experiment as it will be a good test for the infrastructure. This one is scaled for a normal usage.
The risk is indeed have bad response time if the load is to heavy. Then people writing poor queries might go away as they won't get correct response time.
Anyway, I will do my best to educate users on how to perform efficient queries.
Future will tell.

SQL Server is great - thank you very much for providing this!

Argh!!!! I was just trying to build one of these!
I'm not sure whether to be upset I spent 2 days trying to make it work or ecstatic that whatever guided me, guided you there first.

Eitherway, I'm glad you did this, but super jealous you managed to pull it off faster than I could :)

Congratulations though, you earned my upvote. I'm following you now too!

I would have been super jealous too if you did it before me ;)
Everything was ready 2 weeks ago, but I wanted to fine tune and check every bit and byte of the infrastructure before releasing it. No worth to publish something if user experience is disappointing.
Many thanks for your support !

This is a really great thing you have going here! Kudos for setting up a public node and service for the community. I sincerely believe this post should be upvoted like crazy and end with 3 or 4 figures. This does deserve more notice and appreciation. But from me, you've my upvote and follow. Thank you.

Thanks for your support!
The load is low at the moment but I already received lot of positive feedback of people wanting to use it.
I also wish I could get a few reward for this, not to become rich but to be able to put the server on a dedicated infrastructure with HA and DRS and ensure the best service to the community.

Hi, I can't connect using SSMS, do you use default port? (1433)

Ping to server is not responding, I'm not sure if i'm doing something wrong :(

Thanks for sharing!

SteemSQL has moved to a monthly subscription model and the default free account/password “steemit/steemit” has been disabled.

You will find more information about the subscription process here

Please visit steemsql.com for more information.

This is good. Although I like something like PhpMyAdmin more.
What hardware is your service running on? I'm curious about the scalability.

The server on a Intel Xeon E5-1650 v2 3.5Ghz 6 cores with 64GB memory, 8TB storage with MegaRAID 9271 Cache 1Go + CacheVault. It can be moved to a VMware HA cluster if required.

Sounds great.

I'm late to the party but I am excited about this as I am learning SQL at work and would love to use that knowledge to understand Steemit better and contribute to the community with that knowledge.

Do not hesitate to contact me on steemit.chat if you need any help to create amazing SQL queries.

Thank you, kindly.

This is way freaking cool. Man, I fall behind on #dev in steemit.chat by just a couple days and miss amazing gems like this. Well done!

Thanks ! Hopefully it will be useful to you.

I'm at RESTfest today, and I'm contemplating building a REST interface for the db as part of the hack day today... Hmmmm...

Best of luck! Waiting for any feedback.

Unfortunately, I wasn't able to get my local php connecting to mssql. :(

hey, I'm playing with it right now. how I can I see the full architecture of the database?

All tables and their relation are shown in the database diagram in this post.

I thought there was another way as well

If you need help or have any request, please join steemsql channel on steemit.chat

http://steemsql.com/ is not working?

Haven't noticed any problem.
Please use steemsql channel on steemit.chat for help or support

Just clicked on the link: http://steemsql.com/ , I got this:

This site can’t be reached

steemsql.com’s server DNS address could not be found.

The link in your first reply works.
The link in your last reply is broken (redirect to "http://steemsql.com/," with a coma at the end)
It is useless to check "http://steemsql.com" website ATM as it redirect to this post

Still not working for me. No, it doesn't redirect to this post. Anyone else please check? http://steemsql.com/

By the way the comma is a known old issue of steemit.com related to link parsing. It's off-topic though.

This is awesome! Thank you for your effort.

Doesn't this defeat the purpose of decentralization? If there is a central database with all the information on the blockchain, then wouldn't that create a vulnerability?

This would be better if the database itself was decentralized... I think I'll code one like this. Thank you for your information! :)

Thank you so much for hosting this! You are my hero! Successfully connected and doing some poking around in the data for fun tonight :)

Here’s a sad scene for you. About 4 people have read this post I made about 3 days ago: “Can you supply some block-chain data for payment?”. Your post suggests that there must be or than 20 people in the community who might have been alerted to my post, and some would have promptly sent me to your service, which clearly needs to be well supported.

New Steemit posts disappear into ‘ether’ 20 minutes, and will turn up on peoples’ Feeds only if they are Following the author!

Anyway, I will be trying to use your service soon, and pay suitably. I am experienced programmer; but have not bothered to learn Python and only occasionally use SQL (inside SAS). I will come back when I have done the needed studying.

Cheers!

Here’s a sad scene for you. About 4 people have read this post I made about 3 days ago: “Can you supply some block-chain data for payment?”. Your post suggests that there must be or than 20 people in the community who might have been alerted to my post, and some would have promptly sent me to your service, which clearly needs to be well supported.

New Steemit posts disappear into ‘ether’ 20 minutes, and will turn up on peoples’ Feeds only if they are Following the author!

Anyway, I will be trying to use your service soon, and pay suitably. I am experienced programmer; but have not bothered to learn Python and only occasionally use SQL (inside SAS). I will come back when I have done the needed studying.

Cheers!

Amazing. Glad to found this post. Thank you for providing the infrastructure. I have accessed it and it works well

Hi @arcange. Is this sever offline or no longer available?

Post you queries in steemsql channel
I will analyze them and see how to improve DB design/performances