August Hive Data - Word Count, Effort, Reward, Rambling

in Hive Statistics18 days ago (edited)

Intro

August is now over. I have been collecting data on chain about word count, the number of images in posts, and several other metrics. There's been no new features built into the data set this month, as I've been pursuing other endeavours, like getting work together for a history symposium, getting my photographic art ready for a solo exhibition, and have also been working on writing fiction, towards the goal of completing an anthology of science fiction stories.

I've been writing a lot, posting everyday about random topics that interest me - because if I much preach what I believe, which is that content on hive should not be content about hive, perhaps some other people can follow my lead. My recent and future posts will be at 100% power up until I'm an Orca. That's my goal for the time being :)

Meanwhile - this post is about HIVE, because I am trying to keep my skills serviceable when it comes to data and data manipulation. I've been applying for meat space jobs, in the analysis / data field, as I have now been unemployed for almost five months after taking a redundancy from my prior employer! I miss disposable income! :D

Features Built This Month

No new features in the data set this month. You can look at the previous month's data here if you wish to do a month on month comparison.

So, for the month of August, what have we achieved as a blockchain?

image.png

If we were writing novels on HIVE, we published three more this month, as a collective. I don't think we actually did, though, as more and more short-form content makes up this data.

There were about 150 more unique users publishing posts on HIVE in August, which sees a small uptick in activity. Still, have 2% more unique authors than the month prior is nothing to sneeze at. That's good growth. We must retain that growth. if we did that month on month - (which I am not dumb enough to anticipate, or expect as a realistic situation) - it would still take SIXTEEN months to reach 10,000 monthly unique authors of top-level posts.

Less value was given to authors this month, with the average payout being down as well.

But as I said at the start of this post, I want to focus on the features that I engineered into the data, which is to be able to look at it by word count:

I will try to let the data speak for itself here:


image.png

image.png

image.png


image.png

Word Count and Image Insights

The below is a markdown table, it is a bit wide, you might need to scroll sideways. It breaks down posts by their word count, then a number of other metrics.

Word CountPosts% of ContentTotal PayMax PayAvg PayMedian PayAuthorsAvg Words Per PictureImages FoundAvg Images
< 50 Words1286413.01%$53,529$160$4.16$018165.70366532.85
< 100 Words89879.09%$7,306$57$0.81$0155725.60392684.37
< 250 Words3610036.50%$14,972$54$0.41$0272029.30938922.6
< 500 Words1633816.52%$30,900$61$1.89$13281100.601581299.68
< 750 Words1135611.48%$33,291$69$2.93$22538137.814706712.95
< 1000 Words52135.27%$17,552$91$3.37$21670157.38574416.45
< 1500 Words51825.24%$19,349$85$3.73$21386159.611269021.75
< 2000 Words14461.46%$5,938$60$4.11$3540189.53708325.65
< 2500 Words5580.56%$2,105$49$3.77$2214357.11633229.27
> 2501 Words8600.87%$1,719$61$2.00$01881044.81417916.49

Top Authors by Various Metrics

Again, here, I am letting the data speak for itself. If you want to know a particular author's stats, let me know, and if I get time, you might get a reply - as I will not be able to upload the whole data set here, it is enormous and I don't want to crash your browser. :)

Most Posts

image.png

Most Replies

image.png

Most Pay

image.png

Highest Max Pay

image.png

Highest Average Pay

image.png

Most Words Published

image.png

Most Swearing

image.png

Highest Average Word Count

image.png

Most Images Posted

image.png

Highest Average Images

image.png

Highest Pay Per Word

image.png

What I have Learned

What hasn't changed:

When I started looking into this data, my hypothesis was that longer posts should get more rewards. What I have seen by looking at this data over the last ten weeks or so, is that this is not the case.

As is the case with creative content such as blogs, travel logs, photography, art, music, fiction, philosophy, science, homesteading, code snippets, or the vast other types of content that people post on the platform, there is no single indicator of quality that can be programmatically determined.

What's gotten worse?

The distribution of payout to shorter posts seems to be increasing compared to the distribution in payout to longer posts. This , as per the comments above, isn't to discourage anyone from doing what they do. People are free to create whatever they want, and curators are free to vote however they want, but I want everyone to look at every post with intent and thought before they vote.

Comments on Purpose

Again, everyone has a different purpose on HIVE, and it is not appropriate for anyone to determine the purpose or intent of another - but we can look at the past, and from it, perhaps learn something new or improve the way we do things in the future. The last month saw me hit my nine-year milestone of publishing content on chain, and I don't see myself stopping anytime soon, because I have oh so much trapped in my head that I must simply get it out.

What do you have trapped in YOUR head?

Sort:  

Thanks for sharing -- some interesting stats, would be cool to get some different stats ontop - weird stuff.. things like title word count vs. upvote and what communities the top one's hang out in.. or what is the most common 10 words in the most commented on posts.. haha- maybe for a future post.

I did some of that in some of the past Hive Data posts I published - looking for the count of most popular words in comments.

Community payouts...

image.png

It's funny that 4 of the Most Swearing I know are from Down Under. And congrats on second place for highest payout! And you're also in the highest total pay, not bad.

That ben.haase dude is quite the outlier with over 20k posts, so probably a bot - who can make 700 post a day? That's basically a post every 2 minutes of non-sleep time. And considering the amount of posts, 275 swear words is not much. Galenkp has a better ratio 😅

Thanks for putting these out! I find them very interesting.

It's funny that 4 of the Most Swearing I know are from Down Under.

I hadn't made that connection! 🤣 Galen I'm used to in that regard, but Riverflows took me by surprise. I've never really noticed it that much from her, but now I think about it...🤔

Yes, I was surprised by that, too. I don't follow her that long, so I wasn't sure, maybe I'd have skipped those posts.

I don't think it is human :)

On the payout side, I worked my arse off this last month in terms of writing. I did have my outlier post a out 9 years of being on chain though.

Which one was it? And it's great to see that the work you put in is paying off.

Ah, the Nine Years of Loss - Reflections on HIVE / STEEM post was uncharacteristically well rewarded when compared to all posts, let alone my own.

Oh, I remember that one. Yes, that was a really good one, definitely deserved. And it shows that some people became nostalgic with your writing :-)

Less value was given to authors this month, with the average payout being down as well.

Before reading on, I actually noticed this lately. A lot of people barely breaking $3. The few autovoted people performing just as well as ever, but manual curation looks a bit like a wasteland.

With the growth of my own account, (which has been significant this month - I've posted everyday and have been setting to 100% power up until I'm an Orca) - I am going to always curate manually. I currently do this from a combination of my new (first place I look) - my own feed, then the communities feed.

I also tend to reward genuine engagement in the comments. This only looks at post data, but I am suspecting that a bit more of the reward pool is now going to comments with the increase in use of threads, snaps, and waves. (And the use of commentrewarder)

I've had a bit of a kick between the legs with business related costs over the past month but I'm back with the 100% rewards too. Also want to build a bit more HBD savings again. Post rewards over the past week have been quite low for me but that's another reason to push up the HBD and hedge against the lack of attention. ;^)

commentrewarder

I actually just remembered about it the other day and have as of today started to use it again. Curious as to whether it makes any difference in that aforementioned attention. Will be an interesting test in whether the lack of comments/curation comes from attention spans or just genuine disinterest.

Or gets people flocking to your content to leave meaningless statements in the hopes that they'll farm some additional reward.

I have seen both sides of the coin, me, I'm going to reply with intent and detail whether comment rewarder is turned on or not, because that is what I enjoy about hive!

Or gets people flocking to your content to leave meaningless statements in the hopes that they'll farm some additional reward.

That's pretty much what stopped me from using it way back. I noticed a lot of the comments were quite effortless, and I just didn't end up voting on any and would end up getting refunded the percentage. Maybe now some of that commentrewarder 'hype' has stabilised though and people pay less attention to it. Or, at least I'd hope haha

I have seen both sides of the coin, me, I'm going to reply with intent and detail whether comment rewarder is turned on or not, because that is what I enjoy about hive!

Yeah. I think the @topcomment initiative has been quite good for that side of things. Finding the more authentic engagements to reward and which appear to be less surrounded by some sort of incentive (commentrewarder, whale attention if they're the poster).

I still have eyes on the top spot of that topcomment leaderboard :P I'll get there one day!

Cool data! Congrats on the history symposium & solo exhibition prep! 🖼️ Always interesting to see the on-chain metrics.

Thank you, this might just be the last time I present this data - the trend doesn't change much - and many other authors look at the payout data - looking at it by word count doesn't seem to add much value when the distribution of author habits doesn't change majorly month on month.

Whoah! That’s a lot of data! I suspect the rise of sub-50 word posts and their pay has to do with the growing popularity of InLeo threads, since they are very Twitter-like, and also the rising price of the LEO token. No?

And congrats on crushing it in the "Most Pay" section!

The threads stuff is "comments" - I looked at "posts" in the data set, for the most part - but... there's a lot of spam and a lot... more spam - and my methodology is probably not 100% perfect.

But yeah, the TLDR is longer posts don't get rewarded as equally as shorter posts, even if we don't consider threads, snaps, waves et al.

Oh, ok, interesting... Then I’m back to my original theory: OG accounts have curation trails longer than the Silk Road leading to them. Hence posts like "This is what I had for breakfast today, here’s a pic" end up getting ungodly amounts of upvotes. Any thoughts? Because I got nothing after that :D

Yeah, seems to be the likely cause. :)

Neat way to express the data. I love this kind of stuff. If you do scale these back, I'd be really interested to see maybe quarterly data sets instead of monthly, if that's not a hassle and you think it's worth exploring. Of course, if that doesn't interest you then no worries!

The two months of data I've got stored locally so far is about 5.7GB - its an enormous data set. I have to chunk the queries to avoid overloading HiveSQL at the moment, so quarterly would be another chunking exercise ;P

I know there are other solutions to grabbing the data - but at the moment, I really want to focus on my stuff, and not productionising data. If I manage to get my own hive witness node up and running and have the data available locally, then that might be another story all together :)

Looks like the entire history of the chain is something like 5-600GB :D Gonna take a while to sync!

Oh dang haha yeah that totally makes sense! Following your own motivation is definitely better than dealing with that!

Solid powerup goals, and lots of data. Payout by platform would be an interesting metric. Longer comments to follow when I have more time to work back through the data.

I have that as a view in my underlying data model, but ultimately, it doesn't matter which platform an author uses to publish, as content is viewable across each of them. I made a post about that in the past.

"Most swearing" gave me a chuckle, especially seeing that @riverflows got on the honours list. 🤣

It's interesting to note that the most payout is going to the <50 word posts, despite the fact that the majority of posts are in the <250 word region, where I would have expected it to correlate. Plenty to speculate on there. 🤔

What do you have trapped in YOUR head?

Probably a few things that got lost in there somewhere. I might find them again if I clear enough out. I'm not sure what condition they'll be in, though. 😱😅

No one reads longer content, sadly. Attention spans are far too short. They need more creatine in their life.

So what you're saying is that I need to intersperse my long form content with short form content promoting the virtues of creatine in order to ween people onto the long form. 😜

Oh shit, I didn't make it to the "Most Swearing" list (again).

I'm not surprised that the average rewards are down... Hive is down, so the value will drop too.

I maintain that staying between 500-1000 words is the most efficient work-to-pay interval to be in.

This will probably be the last of such data, I've tested my hypothesis and published the data. It is up to curators to appropriately distribute rewards.

Well Fuck me sideway I have been good this month.