Unlocking Hive's potential for social media - pre-announement of DeepHive

Over the last 6 years I tried to onboard a huge amount of people. Unfortunately, most didn't stay around for long. The reasons all boil down to the same thing I think.

Not everyone is a content creator. While everyone likes to share some things once in a while, most time spent on social media usually is consuming.
But it's very hard for a new user to find content that interests them. While communities help (if there is one for the specific interests of the user), there's still too much posted in general-purpose (vote farming) communities, or on a personal blog only using tags.

The success of mainstream social media comes from their ability to analyze posts and users. When you sign up, you're asked about your interests, and immediately get immersed by being offered content that aligns with them.

And I think that's one of our main problems. We don't know anything about our users, although we have quite a big amount of data about them available.

So I started working on a new project recently to fix that. Working name is DeepHive, because I'm not creative with that kind of stuff and it gives an idea about what it's supposed to be doing.

I prepared a database to fill with posts and analyze them. There are several challenges to be solved here already, automatic language detection isn't too good with open source tools, and preparing the data to train ai models will require a lot of manual tagging. Luckily I'm well connected with curators, and I hope I'll be able to make them help with this without adding too much to their workload.

But it's not finished after that. A lot of the metadata other platforms have readily available is lost for us. Where does a user click, how much time does he spend there, did he vote manually or automated? All these points are valuable information. Of course for some people it's the appeal of Hive that no data is stored about them. I think, when the goal is mass adoption, that's a niche group though. Most people are happy to have their interests analyzed and be offered content (and ads) that might be relevant to them.

So on top of the data we can harvest on-chain, in the long run we will need integration with interfaces. If this will be a new platform, or if we can integrate with @peakd or @ecency has to be seen. Of course it'll have to be strictly opt-in, and users should always have full control over what's being stored about them and what not.

Obviously, this project is way beyond what I as a single dev could stem, and my witness pay isn't enough for even 1 additional full-timer. That's why I decided to post about it now already, without having anything to present yet. If you're familiar with machine learning, text analysis and/or user profiling, and you're interested in using these skills to bring hive to the mainstream, please hit me up (best on Discord).

If not, tell me in the comments here what you think about it. Do you agree with the evaluation, and do you think a more tailor-made experience would help with retention? Would you be fine with your data being used to create a better experience for everyone? What's a red line you wouldn't want to see crossed?


I feel most creators do a poor job at making it easy for someone to find more related content of their own once someone finishes consuming a piece of content. They usually have to go digging around on that person’s blog to find the next thing. So anything that helps them on that end is a big deal.

I’ve attempted to solve this in several ways over the years in my content. I’ll include links at the bottom to something else as related as I can. I’ll try and provide links throughout the post if I’m referring to something else I’ve written about before.

In the past, I’ve attempted to make different indexes of my content based on a game's genre. Recently I have been playing around with using CCC for topics I tend to have a decent amount of posts around.

Empowering the content consumer to find more content to consume is huge. Even more so for all the content that is more than just a couple of hours old. Some of my higher viewed content I wrote years ago and they are just performing good enough in search results to stay relevant and bring in views even today.

Hive has a massive amount of content. Most of it after a few hours never gets seen again unless a lot of effort is put in by the content creator. Just having more tools to help combat that issue would be a big win for the ecosystem in general.

You should not even have to consider spending your witness earnings it should be something that gets further funding. I would be in favor of supporting a proposal for the things you describe.

I feel most creators do a poor job at making it easy for someone to find more related content of their own once someone finishes consuming a piece of content.

And it really shouldn't be their responsibility to guide the readers...

Thank you for the feedback, I feel better about tapping into the dhf with it now - once there's something to show =)

I always thought it would be great to track post-view-count on the RPCs ends... possibly even generate an evergreen reward mechanism for that... it's probably awfully easy to cheat the system and maybe it would require a fundamentally different architecture... but yeah... just throwing that thought into the conversation.

I like where you're going with this!!

Yeah, while it would be nice to be able to tie the reward mechanism to views, this could really be gamed way too easy. Counting views on a frontend can be secured better, and while it doesn't help the authors of evergreen content to get more rewards, maybe we could run an ad model later where creators and consumers get a share.

If content selection would work very well, it could be implemented in one front end first ( centralized control) and selling ad space with some rewards to authors.

I think the front end that solves this, could be a billion $ one. The reason is people would share content more and earnings are not caped. So it would be possible to earn massive like on youtube ( in theory).

Not a solution for hive in total, but something that could be built on hive.

But I expect, if there is a success we see no additional onboarding and mass cannibalism from other sites, that copy and paste and take all rewards or something like that.

Between you and me, people are really lazy, they want to stare at a screen that keeps moving posts in front of them. There's no such DApp yet, nowhere, but Apps do that. Maybe it'll be possible after the HAF release. But don’t tell anyone, this opinion might shine a bad light on me. Btw, I‘d also like to have a possibility for encrypted comments, that should be relatively easy to implement.

i had a short talk via comments with smooth.

I would love to have anon HBD/ Anon hive tokens.

Think about anon stable coin. HOLY shit.

Encrypted comments and so on. That would be huge! and really a stand allone.

Yeah and we can do that via the MEMO Key system. Anon stable coin - that would be a category on p***hub.

run an ad model later where creators and consumers get a share

i hear you! that's a good idea! actually, in all honesty, if a centralized frontend provider runs ads on decentralized user content, distributing the shares accordingly should be a given?!

I remember back in the day steemit.com used to provide a public view-counter. I was quite disappointed when they took that off... it was at least kind of interesting to see those posts with barely any views and massive rewards.

We're going a bit off topic, though... back to your original concept, datamining content topics and user browsing habits to provide a custom recommendation feed to find interesting authors and posts would greatly help user retention. Content discovery in the frontends as is takes real effort and time... which is why I have always been a lousy "curator"

The issue is, how would that data mining be paid, by whom? And would it push certain content over others to be the biggest paid on hive?

How would it handle people dv content because yes?

Because people sometimes dv content for no good reason, just because they dont like certain content...

The dv adds another layer of difficulty to the machine learning in data mining. In my opinion :S

People also upvote for no good reason. This post has >200 upvotes after an hour. How many do you think read it?

Both up- and downvotes are only useful data points when connected with others like reading time.

Financing might become an issue, at some point we'll probably have to tap in the dhf and explore further possibilities for monetization.

If you create a DHF, maybe do a crowdfunding as well and offer Badges / limited NFTs as rewards. We're in a bearish market, I'd rather spend FIAT than crypto.

How could the data miner handle the trails? 540 votes but like 3 people reading... That would cause at least some issues with it. And would comments influence?

This has to be explored, tested and evaluated. These types of algorithms aren't static once implemented, they need constant fine tuning.

DHF is not such a good idea, to a point the fund is already overloaded, yeah technically it can hold more... but eventually it could not.

And a curation project would never have enough profits to keep it going, isn't it?

I don't know about other curation projects, but curangel doesn't make profits at all. For now I'll be able to cover the expenses and (slow) development through the witness. If I drop out, price drops a lot more, or we get to a point where we need a lot more manpower, we'll have to see.

The dhf has a lot of room for more projects, provided it's regarded as more important than the HBD stabilizer.

I see two issues with that, trails and down votes.

Curation trails could edge the data mining in weird ways, for example, a new user could dodge a 10$ post because it looks less attractive than a 140$ one, and it may not mean that the 10$ one are lousy in quality in any way, The algorithm should not show content by profits, votes, or dv if we were to add.

And how would it handle dv at all? Facebook do not have a dislike, YouTube algorithm went crazy at times with the dislike, to the point of becoming a problem for youtube to work normal... So how could hive beat those issues?

Besides that i only would like to add that one way or another, is a need, hive needs to retain users, if not, we will be a high 5 with cripto, a couple good ideas that ended up being recycled by someone else later on...

The algorithm should not show content by profits, votes, or dv if we were to add.


For example, I have onboarded 7 people so far, current active users, 3, that is a less than 45% retain :S and of those, 2 are on hive for the profits, they do not consume content at all...

They find it too and to quote "weird" to search the content by hand and pick what they like. They are too used to facebook spying you and showing you exactly what you want.

That completely aligns with my experience. Although I'm seriously jealous about that retention rate. Mine is more like 10%, optimistically.

Venezolains need money and most that joins see hive like an extra income that can "happen" from time to time, for them, for me, 20$ is a lot of money on medicines or food.

As long as it remains strictly opt-in, I think it's a good idea. However for the niche group you mentioned, this could potentially chase them away unless they know they won't be tracked.

A more tailor-made experience could help in retention, but it needs to be combined with both bot and human assistance from the minute of signup. Welcome and help those new bees :)

There will always be frontends that don't track, at least hive.blog

I agree with the second paragraph, a tutorial would help a lot already. Human assistance is difficult to organize well in a decentralized manner and without pay, but should be tried of course. Discord works well, but reaches few users, most give up before they go there.

I know what you mean this does need doing - it would help solve the problem of how bad the search is too at the same time.

What we need is some sort of way to decentralise the workload but in a co-ordinated manner.

Hi @pharesim

It's interesting that you posted this today as just yesterday I was reading posts from various members of a community I moderate and many of them said that they joined Hive and then couldn't do anything (lack of RCs), had no idea what was going on or how anything worked so they left and came back later. It made me wonder what percentage of signups end up not being utilized and lost as a result of the fact that without a kind of crash course people get too overwhelmed.

With that in mind, what you are proposing would actually work well I think because then people would at least have somewhere to start. Pick three topics that you like and then the platform aligns possible communities for you to check out. I think it would go a long way to have that and some sort of walk through when people are onboarded because people don't know how it works when they start or where to start. I would totally vote for your proposal once it's outlined but obviously people would want to know what data is being collected and that's something that shouldn't be compromised otherwise we're heading into Faceplant I mean Facebook territory.

Have a great day.


We will never have these spying possibilities on Hive which would allow to show in a personal feed exactly what a user wants. I think it is hopeless to achieve this due to lack of meaningful data (the no. of votes says nothing about quality, only about how well a user is connected).
But we have great communities on Hive. We should maybe incentivize mods and admins to optimize the focusedness of those communities - have less shitposts and off-topic posts in them.

Not on-chain, no.
A second layer solution is possible though.

I agree that communities could be used much better too, but first they have to be able to be set read only.

You can count on @ocd's help with this and hopefully also the communities in our incubation if they're willing!

I don't see automatic language detection playing a big role because the people in the same bubble often have a common language. Relevant content written in a language you don't know can still interest you and you may get clues from pictures or common words what the content is like and then if it interests you can use machine translation to translate it into your language.

That's an interesting thought. Not seperating datasets by language could lead to more false positives for languages with few posts, but that's just an assumption of mine right now. Thanks for the input!

I still need multi language support for posts. That would be fantastic.

You only need to look at the political discussion about Ukraine to understand that filtering out all languages but one will change the view entirely.

I don't even think that people understand how much language is transformative to the human condition in all ways. We should have all been learning 3-4 languages from our early days on.

Mandatory English, Spanish and Russian/Mandarin in all EU schools could make so much difference. Parents and VOD services would optimize the rest for the ego boost gained from smart offspring.

Interesting, you skipped German and French for Spanish and Russian.

I only think about global trade and the biggest markets, sorry.

This is an interesting subject because I believe in open and permission ledgers but also value privacy and security. The question I am trying to posit myself is where the line is, where should privacy be?

I have thought a lot about how I might be able to do this, I am a stats nut so I love to see charts and have wondered how we will be able to use the value that has been added to Hive, and evolve it. Machine learning offers opportunities to do this and I have had some very interesting conversations with GPT-J recently.

I would like to see options in the future for privacy by default, allowing users to reveal just a snapshot of certain information that they would like to make public. This is a conversation for another day @pharesim but,

You have my curiosity about your project. I do like to be able to search and find my own posts, Facebook actually had a really handy search that they nuked because it was too powerful. You could type "Friends of my friends" + "who like Metallic" + "and live in Los Angeles" or many combinations.

I want to get my front end back up after the fork, and have a few ideas about what I would like to do. Not just a community but an interface that shows a lot of this type of data you are talking about. Ways to show charts, display Hive-Engine tokens and other real-time chain data. I want to incentivize curating the new content in a way that is gamified. Unique user experiences are important, what sets Hive apart?

There is also an argument to be made that we don't want to reinvent the nightmare that social medias web2 has become, like a feedback loop that turns us into products. I am very interested to see how Machine Learning can be applied to enhance our experience and make info retrieval quicker and more accurate.

Lots to think about, I need to go but I will think about this more today.

I would like to see options in the future for privacy by default, allowing users to reveal just a snapshot of certain information that they would like to make public

I totally agree with that. It has to be the user's choice.
I like the way you think and hope we can cooperate in the near future!

Hi! I would like to tell you some of my today's thoughts..

Yes, you are right. Not everyone is a content creator. Or a trader to see the potential in the passive income opportunities here. Also, many of the users do not have the required patience/quality/time to see beyond the "post to get rewarded in crypto".
Because it is more than crypto that we are rewarded here, in my opinion :)

Apart from all the above, I was thinking that it would be amazing if there would be opportunities for bridging (both ways) some of the mainstream social media with hive.
For instance: Say someone posts photos and only that .. and uses Instagram. Imagine if with one post on hive, they could post on Instagram too! Or vice versa! But also to receive notifications in one platform for both mediums.

If someone replies to me, and I do not see it, then the conversation does not go forward. Same if someone follows me and I don't see it... That's why notifications are so important!

Do you think that something like that, would be doable?

I am not sure about analyzing behaviors and showing customized content to the users.
Also, the user should have the option to allow you to analyse their behavior or not.

The content I am interested in hive varies than the content I check out on twitter or on Instagram. For instance, on twitter, I usually care about crypto & finance. On Instagram, about travel photos & nature photos. On hive, I care about so many different things.. the algorithm would break.

Another thing about onboarding users should be providing them with proper expectations :) Express themselves, meet people from all around the world, learn about the blockchain technology by using them etc :)

Sorry if my thoughts were blurry here, I was thinking out loud!

Hope the above makes sense!

Thanks for your time

Yep, it misses the selection of a language when writing a post, don't forget that some authors write in more than one language (native and English) making it more difficult for the usage of automatic language detection.

Peakd uses plausible and Ecency use matomo which can give you some interesting metrics (clicks, views, duration, retention...) ;) I think a good starting point for your project is an open discussion with major HIVE frontend owners. The hivesearcher (synchronises hivemind's posts & comments to an elasticsearch index) maybe also something interesting to help you in your project?

The red line exists only if the users don't know and don't agree about it but if you can separate those who don't want and use tools that protect the privacy of the others I think you will receive a good adoption.

Thanks for the links, that's very helpful!
Keeping users in the know about what they share and what not is one of my main priorities.

sounds super good! Would be a great addition to Hive so I|ll support it :)

Thank you!

Sounds awesome @pharesim. Tags are irrelevant, which makes it really difficult for topics findings + recommendations ( for more time spend).

I have also some ideas about it and work for a while on something. User tracking + time spend ( and heat mapping). Like things on avg websites are already standard.

Onboarding is IMO one of the biggest issues. One by one makes IMO no sense. Onboard full community is the way that we need to archive.

I see their huge potential to integrate some hive wallet on forums and other communities.
Special because of the payment aspect ( less content).

IDK you know the story behind Paypal, but the initial plan was to build a lot of financial instruments for online.

The result was nobody cares about this.

But people care about "payment via e-mail". Simple and efficient.

I know it's offtopic to your project making content discovery better, but that's something i think could boost hive.

User tracking + time spend ( and heat mapping)

Yes, I also think we'll need those
And no, I don't think it's off topic. In the end that's exactly what I'm trying to achieve, better and easier content discovery

I have some experience with it ( tracking/Conversion rate optimization (CRO) and SEO). Maybe I can give some input at some point if you want.

Like more eyes see more :) And the topic can become really deep.

Yap... the Tag based ScotBot for the Engine-Tokens really ruined Tags for us.

Strange strange how that happened.

and it was managed really retarded because those scotbot tags could be also in content.

like last line in post : Scotbot: #HE1 #HE2 #HE3

content discovery are really bad and content from the past are impossible to find on hive. The easy way is use google and then back to hive.

I would like more " human curation" in that space.

Like authors can cluster content together.

Can add a part 2 in the same post weeks later and still the parts are connected AND nice to view.

That is a really interesting idea, clustered posts via tags to split/spread content and rewards over time. Therefore we could also monetize content over "only god knows how long" by adding updates or additions and a smart UI might just put those together.

i had many ideas and i would also invest in those if onboarding is not the bottleneck.

Pay twice is something I don't like ( marketing and the onboarding fee)

Some open and cheap version for SMT´s + affordable/free onboarding would solve a lot.

I dont know you saw my post back in the day. But i had written something about prefix and sufix wallets what would allow in theory an massive discount on wallet creation + prevent name squads.

Wallets should not be free. What was the pitch that brought you to 'STEEM' back in the day?

SMT´s and decentralized databases.

Wallets can be close to free. 3 hive is way to expensive. The price needs only to be that high to prevent abuse.

So start in bear market with 1 hive and change in bullmarket to X HBD.

i think between 0,1$ and 0,3$ are fine. Should prevent large scale abuse and allow dapps to build

3$ hive means 10$ per converted user as cost. RCs are only for hive holders, but what if a dapp wants to build with 0 invest? Only invest time like most entrepreneurs start.

BRILLIANT!!! @pharesim , man I have dropped a few links for people to sign up on the @hive-00 , @ecency , @splintrelands , and on @peakd . And not a freaking one of them can join. Dont have tickets or something they all say. Shoot just a few weeks back ago I got my GF @justjen71 to finally try it and she isnt very tech savvy. I took her 2 effn days to figure it out. I tried everything I could and nothing worked. And it is very hard to find stuff to just scroll through and take up just a lil time. Short burst of info, I think thats where we are different here. Its full blast of info. Shit my comments can become as long as the current tax code book. I get tired of trying to even find useless info to look at when im just not in the mood. But thing is , is I strive for information and will hunt the crap I wanna know down because I have a effd up attn span I know ill be somewhere I never thought would be and find something new and interesting. Now I'd like to say, If ya want someone that can get shit done, feel free to HMU. I know a bit of JS, HTML, I do a bit of graphic design I own a custom apparel shop. Ya tell me what ya want and ill get after it like a pit bull and a rope!!! I would love to see some more friends here. Some of which are amazing writers. But not as good of a storyteller like I am, LOL. Great post looking forward to see what drops!!!

In the current state, hive is great for power users. If you invest time to explore you'll find what you want. But not many people are willing to do that, and it shouldn't be a requirement.

I'll set up a Discord server soon for anyone interested to join, frontend and design work are always needed :)

hey im full stack dev. sound interisting. did bit of user profiling by questionnaire.

Awesome, we'll have a Discord set up soon and I'd be happy to meet you there and talk some more

It's actually a tricky situation. At one place, we need mass adoption and on other hand we want our privacy...

Privacy should always be the default option imo

You know what, I‘d love to be able to see recommendations and feeds of other accounts. Just by entering a username and ‚click‘ on switch to perspective. That would be an epic feature.

That's already possible now! peakd.com/@username/feed

Will keep it in mind though for our "advanced recommendations"

Ah, yes you're absolutely right.

So wht is your end goal with DeepHive. Are, you trying to change the way we engage with content?

More like, give regular social media users the possibility to engage with content the way they're used to

Have u seen the ceramic accounts indexer that is being built?

I didn't, no, thanks for pointing it out. Need to dig deeper to really understand it, for now it seems like it'll make this project even more complicated :D

It’s a web2 sign up with off chain indexer. So it’s super easy onboarding and can post to any dapp that has indexer installed. Then upgrade would be to a hive account as they see some accounts are monetised and immutable content etc

Really interesting project! Keep going ...
I think it's a really good idea.

I usually join the communities. That is easier to find the topic you like. But then so many come up, makes it harder to scroll to what you want to read. If there's a read later button, so you can just click it, then stored in your notifications. And some contents are interesting !

Good idea! Maybe someone from @peakd sees this and it'll be implemented there. Not really in the scope of the initial development of this project, but if we get to the point where we'll have our own front-end that's definitely something to add.

Well yes, a data-eating kranken that makes it possible to further monetize our platforms seems very legit.

Actually... I am more than stoked for this project!

I love the idea of having a more tailor made experience with Hive. Hell yeah, count me in.

Watch my blog for the discord announcement then!

Will do. On it!

Why not use the DHF for funding?

Most likely will write a proposal when there's a team and we have something to show

I always thought we should build a Hive Browser that would include all the websites and apps within the network. This way this would provide much more data in participant interactions. That way we could come up with ways to create incentives for content consumption rather than only content creation. You are right majority internet users are content consumer and not content creators. For now Hive rewards content creation and not content consumption.

Yes, I admit, the vote came first and now I have read it. That is typically for most post that I read. I am late late to read post, firstly because of my time zone (I am always behind) and second because of my stage of life (I am always behind!).

With the introduction out of the way, I think the general idea is great to find out more about the fellow hivers from the metadata of their interests. Even today I had a strange conversation where I know two individuals well (maybe one more than the other) but as I tried to mitigate they got more drifted. I thought it started with a simple misunderstanding. Maybe your tool can help us with situations like this.

I am unclear about the method from this short post. Is it possible to elaborate on a future post.

I think you should be able to write a DHF proposal for this and it can get funded.


I would be happy to get analytics on my posts. I am by far not an expert content producer, and I honestly just do it for fun and not care much about the rest. However, getting some numbers would be lovely pieces of information (I like numbers), especially to get a better grasp on what my followers like and what they don't.

Now in terms of data storage, as long as I can choose what is stored and what is not and that everything is opt in, I am all in.