Improving the Curation-Rewards Process via a Significant (though Subtle) Change to Auto-Voting …

in #curation2 months ago (edited)



A Proposal to Improve Hive’s Curation-Rewards Protocol

A few days ago, I posted a comment here alluding to a new #hive #curation-rewards protocol I’ve been envisioning, wherein I was hoping to accomplish the following overall goals / objectives:

  • incentivize high-quality manual curation (both for large and small stakeholders),
  • not severely penalize large stakeholders who don’t have time to manually curate, and
  • incentivize auto-voting that accentuates high-quality manual curation (in lieu of auto-voting that overwhelms, stifles, buries, or otherwise manipulates manual curation, whether intentionally or unintentionally).

Learning the Ins & Outs (& Challenges of) Past and Current Protocols

I want to thank @themarkymark for his clear and concise explanation of #Hive's current curation-rewards protocol (available here), along with his taking time to dialogue with me directly (providing much-needed technical-feasibility assessments). As a result of his direct and straightforward feedback, I’ve made numerous revisions, mostly abandoning aspects of my original plan that were technically infeasible (which leads me to a heartfelt thanks to @theycallmedan for indirectly encouraging me to ‘press on’ despite the various feasibility setbacks).

Interestingly, the end result is a much simpler solution than I originally envisioned (and, more elegant, imho)!

Historical Changes to Rewards Protocols

From what I have read and researched about curation-rewards protocols for hive (and formerly for steem), historical changes have by-and-large sought to either [1] correct loopholes that previously allowed bad actors to ‘game’ the system or [2] correct misalignment of individual incentives relative to community goals & objectives.

A New (or Not-so-New) Underlying Philosophy

The underlying philosophy has been (from the very beginning, I believe) to reward manual curators who quickly find and elevate high-quality content (i.e. to incentivize and reward curators who proactively search for heretofore-hidden high-quality content). Instantiation of this underlying philosophy has consistently struggled (and oftentimes failed) to achieve the desired outcome, because clever people will always figure out a way to ‘game’ any system, especially a system that provides variable rewards.

I am adopting a very similar (but nuanced) underlying philosophy, which is:

    Early-voting periods should be almost exclusively the domain of manual curators.

With an important corollary being:

    Whenever auto-voting algorithms place votes early in the voting period, the strength and influence of any early votes cast by legitimate manual curators become diminished, thus weakening the ‘proof of brain’ aspect of the entire process.

Using the Principle of Inversion to Solve an ‘Intractable’ Problem

Most of my early iterations attempting to solve the problem of ‘inefficient’ auto-votes competing with manual early votes relied upon being able to distinguish (in Layer 1) between manual votes and auto-votes. After many, many failed attempts at trying to solve that ‘intractable’ problem, I decided to invert the problem.

Rather than trying to actively discourage auto-voting bots from casting early votes, what if we directly incentivize them to willingly participate later in the vote-casting process?

The Proposed Solution

This can be done very simply, by [1] establishing a relatively small window (e.g. 5 minutes in width) that [2] opens a set duration after each post (e.g. 1.5 days), and [3] entitles the vote-caster to a pre-determined stake-weighted percentage of the total curation rewards for that post (or just a set percentage of the rewards associated with their vote, e.g. 40%). Any votes cast outside this very small time window will be subject to the existing (non-linear) curation-rewards protocol. Any votes cast during this window will be specifically exempted from any early-incentive rewards, instead receiving a pre-determined stake-weighted percentage (similar to each-and-every vote under a linear rewards curve, cf. #LeoFinance and #STEMGeeks).

... and its Advantages

The primary advantages of the above ‘exemption’ for intentionally-delayed auto-votes will be twofold:

  1. Early votes will be cast mostly by manual curators (subject to a couple caveats):
    • there will no doubt be some auto-voting bots that are remarkably good at pre-selecting exceptional content, or will at least try to do so, and
    • the extent to which auto-voting bots attempt to early-vote will be directly related to [1] the early-voting rewards criteria and [2] the pre-determined percentage associated with the late-voting window.
  2. Late votes cast by auto-voting bots will boost content while it is still fresh (i.e. only a couple days old), thus ensuring that high-quality, manually-curated content gets seen by more people (and then further upvoted, if the content is genuinely high-quality):
    • in other words, auto-voting bots will begin to serve as boosters of manually-curated content rather than merely predictors of hopefully-good content,
    • thus taking what was formerly a negative (auto-voting bots that were injecting noise into the manual curation process) and turning it into a positive (auto-voting bots that give a delayed boost to content that has already proven itself worthy via manual curation).



I have some additional thoughts, but will hold off on those for now, in order to get this ‘out there’ so folks can evaluate it and provide (hopefully constructive) critiques.

Positive incentives will give helpful bots meaning & purpose!

Sort:  

Interesting solution. I have always envisioned a sort of lottery system for curation, where everybody just votes on great content and receives a predictable curation, but there is also a bonus pool that gets created that is awarded to some subset of voters, irrespective of when they actual voted.

Interesting concept about lotteries.

That’s similar to a thought I had recently about randomly varying the early vote window thresholds, different thresholds for each post, unknown to anyone a priori. Manual curators just curate and sometimes they get lucky with the windows and sometimes they don’t.

Randomness would discourage early voting by bots.

Randomness would discourage early voting by bots.

That is naive. Bots love randomness because the parameters are public and they can compute where the sweet spot is. Especially if seeing early votes allows them to recalculate EV of different timestamps. Longterm, they would just kill anyone trying to play the curation game by guessing (or even worse, sticking to a predictable timing).

That is naive. Bots love randomness

Labeling a post as "naive" diminishes your standing as a legitimate debater, imo. Bots do not love randomness. Yes, they will compute the 'ideal' spot based on the parameters, but that is just the point. The creators of the parameters can use randomness to shift the 'ideal' spot in such a way that potentially diminishes their overall negative impact.

I was not saying this was a 'good' solution, but merely one of many to openly discuss and debate. It's not a position I would favor (nor is a straight lottery), but worthy of being 'out there' for folks to consider and debate.

I never labeled your post as naive. I quoted a statement and claimed the statement is naive. In math, that is a word with an actual meaning. Or at least it was a decade or two ago. Maybe it has been flagged and had to be substituted since.

Going back to openly disussing your proposition:

The creators of the parameters can use randomness to shift the 'ideal' spot in such a way that potentially diminishes their overall negative impact.

If we talk the current system where (pre-penalty) payouts only depend on the mass of votes before/after you (not the time distance of votes from each other) and allow the referee to pick any random distribution of the cutoff point (currently set as 100% at 5 mins, 0% elsewhere) - that means one distribution that is used for every post until next fork - your job is basically to find a spot to plant your tree on the timeline at fertile area (high density) with as few competition around you as possible. There is no ideal spot you can precompute, it is a real-time calculated challenge (where you both watch how much space can you grab by voting now from the near past, as well as guessing the competitors' tendencies to see how much space you expect to grab in near future).
It is a fast-paced game where humans can barely pick up the data to make a guesstimate while bots can decently compute both EV(vote now) and the probability of getting better spot later (there are enough posts available to be patient and pass if things go bad).

Your trust in Someone-to-Set-Parameters-Right allows you to present expectation that I found inconsistent with the actual architecture of the game. I wish there was a single word I could use to indicate I consider the statement unsupported and therefore disagree with conclusions derived without being pc-bullied (did I infract again?).

I quoted a statement and claimed the statement is naive. In math, that is a word with an actual meaning. Or at least it was a decade or two ago.

Fair enough.

... expectation that I found inconsistent with the actual architecture of the game.

I do not claim to fully understand the architecture of the game. By all means, call out anything that might be inconsistent with the status quo or otherwise infeasible.

@themarkymark did a great job of explaining the current 'rules of the game' in principle here, but I have yet to dig into the code; so my comments and perspectives very well might 'miss the mark' in that regards.

That is naive.

A bold statement coming from an alt account.

Bots love randomness because the parameters are public

Assuming your algorithm is shit.

they can compute where the sweet spot is.

As opposed to hard coding their sweet spot as it has been?

If you think 5:00 is the sweet spot in the current system, you are ... (oh, you know what, I can't say it). The current sweet spot floats somewhere before 5:00 and bots game it better than humans. To clarify, the sweet spot is not the time that will be drawn out of hat. The sweet spot is the timestamp that gives you highest expectation based on actual situation (so it moves as time flows).
The mainstream proposition is what actually "hardcodes" the sweetspot extending it to the first few hours taking someone's toy away.

I would appreciate if you could out my main account.

Any system that does not look at the whole issue is only a band-aid. Any curation system that does not include the down vote (anti-curation), is only another attempt to maximize the returns for large accounts.

I saw no where in your proposal talking about the down votes, so in my point of view it is an incomplete proposal.

When it is fixed so that large curation teams with a lot of Hive Power, can not down vote non plagiarized post, nor down vote excessive reward post.

Due to the actions of a few large curation accounts, Hive Block Chain is going the way of Face Book, Youtube, and other social media accounts where only content they find acceptable is allowed to be rewarded, and all other points of view are to be muted or down voted into oblivion.

No I have not been muted/down voted to oblivion, I thought Hive had moved beyond actions that down voted content not for cause but because of opinion/point of view. Hive is a world wide platform, not a one country platform, not a one opinion platform, and evidently not a free speech zone either.

So before fixing and putting a band-aid on half the issue perhaps review and look at the whole issue.

This is an interesting idea, but what hinders auto-voters to still cast votes early to farm even more curation rewards? I am a proponent of keeping things (and reward curves) simple, and entirely remove the curation time window and make it linear for both manual and auto-voters like leofinance did.

I would just prefur leo's approach that way at least the bots are not getting an advantage over manual curators.


My witness node - Stream on Vimm.tv

My issue with Leo's linear-rewards approach (the way I understand it) is this:

    Why not just allow accounts to receive their 'curation' reward of 50% even when they are not actually curating (with the other 50% spread across all content creators proportionally).

By doing that, those who are not manually curating are in effect just amplifying the voting power of those who are (but aren't losing out on anything by opting out of the curation process).

In other words, remove the incentive to use auto-voting for base rewards capture.

Granted, that removes all financial incentive to curate; but is that a bad thing? Maybe not. Curation then becomes a voluntary social act. The number of curators will drop (perhaps significantly), but the quality of the curation might go up.

In summary, with a linear protocol, why incentivize auto-voting bots that add noise to the manual curation pool, when you can just let those who opt out of curating earn their 50% reward? That would be a much cleaner implementation of linear rewards, imho.

It is useful to think of Leo approach as an extreme case of the system that is being proposed (the all-timestamps-equal first window extends to full time, the post-window time-gamified voting shrinks to zero).

I claim that adding the second window keeps all the upsides of linear (as long as the first window is long enough), anyone feel free to dispute this and/or point out downsides that the second window introduces.

The number of curators will drop (perhaps significantly), but the quality of the curation might go up.

I do not think Leo suffered such a drop but for the sake of the argument - if the drop happens the danger is that noise (auto or manual - does not matter) prevails and noone can tell which high payout posts were voted by quality curation and which are random events.
(This is an actual danger for the new system as well if too many votes are being casted in the first window).

when you can just let those who opt out of curating earn their 50% reward?

The more people opt-out, the more power of author reward allocation goes to those who vote. Only good if everyone is an honest curator. Self-voting and vote-buying (open or concealed) are getting more efficient.

This is a very good article for everyone to see how things are being done in regards to the change in auto voting.

Late votes cast by auto-voting bots will boost content while it is still fresh (i.e. only a couple days old),

Still fresh?
Most votes happen in the first 10 hours, is my guess.

When I got here there was a 24hr payout, and a 30 day payout window.
I thought that was a good idea, but that wasn't left in place.
Instead it was changed to one payout after 7 days.

By "still fresh" I meant still within the upvote curation window. It doesn't do any good if the post is excellent but doesn't get noticed until after the 7d window has passed.

If the auto-votes are incentivized to occur at 1.5d after the original post, that will boost the 'good posts' (i.e. those that got the most manually-curated votes) while there is still plenty of time for additional manual curation.

If you make an excellent post that takes weeks to discover, there is a good chance people can upvote your recent work or send you a tip. Your point is valid but tailoring the system around these rare occurences does not appeal to me (partly because I do not think 1.5 days is long enough to discover the new stars based on my limited experience).

Personally, I don't fully understand the rationale behind a 7d window on curation rewards. I guess some window needed to be chosen.

Neither do I. I have never seen anyone propose longer window. The usual response to shorter window proposition is along the lines "We do not want the people to withdraw the rewards too fast" which sounds ridiculous. On the other hand, it is not much difference either way.

Loading...

I'm willing to try it.
My main gripe is the exponential penalty on small votes.
That thing chaps my hide.

My understanding of your proposition is that you want to replace the current system of A% to author, B% to curators/upvoters with a three-way split of A% to author, C% to early-window curators/upvoters (lets call them curators) and D% to past-window curators/upvoters (lets call them upvoters).

So the only difference to the system discussed in blocktrades thread I linked earlier (1-24 hr window where all timestamps are treated equal) is the length of the window (there has been a consensus that 5 minutes is not enough for manual curators) and the predetermined rewards split (please elaborate on advantages).

My first thought is that it introduces a game where the bots calculate how much votepower should come at the begining of the upvoter window. As for the consequences, it feels like bots will be more effective taking value from regular upvotes while curators being mildly pushed towards safe bets (discovering a gem gets a huge part of curator share but the total payout will not be much inflated be the early-upvote bots so it is a bigger share of a smaller pie).

Thanks for the quick feedback.

I agree gaming can and will happen. Whether it’s worse than existing is unclear to me at this point.

Probably need to work through some hypotheticals on a spreadsheet.

An extra degree of freedom, all else being equal, probably does increase the potential gain from gaming the system.

However, providing an honest alternative to the current bot race for early votes might convince the vast majority to accept a guaranteed honest gain instead of wreaking havoc via early excessive bot voting.

Another advantage would be the whale bots could be programmed to intentionally avoid delayed upvotes to posts that have been targeted by bots gaming the system.

A battle of the bots, so to speak.

The latter bots would have a distinct advantage, by going last.

Could also allow content creators to reject upvotes.

Community friendly whale bots could help content creators identify which early votes likely came from bots so they can be manually rejected.

There is no extra df, you introduced an extra pool (df:=df+1) but also a new rule that you can only vote in one of those (df:=df-1)

Asking people to be nice has been tried before. It does not work on chain-wide level. Obviously, if we talk a small tribe built around off-chain estabilished community (such as a tribe around a specific university seminar) the off-chain reputation works wonders and you should just do linear curation (the purpose of curation gamification is to find a pearl in a sea - no need for that if you work with a glassful of objects within your reach.

The idea of incentivising late voting sounds great but in the end all it can do is to move the bot playground from the first five minutes to the last five minutes. As long as we do not flag bot votes, that is. This can work on Layer 2 if a tribe decides to put a captcha on their frontend and create a system where auto/manual is treated differently. I am curious what Archon guys think about it - @taskmanager @ecoinstant

Any button humans can click, a bot can be programmed to click it more regularly.

The idea of having two pools is interesting, because a bot could only compete for one of them, but it is unclear for me that all the bots would stay in the same, over-competed 5 minute pool if there is another pool to hunt in.

I don't even like the idea of 'flagging bots', if we want different things we should propose different rules, not try to attach morality to certain actions - (ie 'asking people to be nice has been tried before' 🤣). Flagging is permitted, but it hasn't shown that it is a very effective way for changing people's behavior.

The question for me is 'Why do we hate bots'? Since a bot is just a way for me to do what I want more regularly - a bot is programmed by a human, and we want human votes presumably. If its because the bot is unfair and not everyone has a bot, then it seems that removing the competition aspect (as LEO did) would be the best way. Everybody gets 50% of their own vote, nobody competes, bots are no longer 'bad'.

When I say flagging, I never refer to downvoting. I do not know why Hiveans use that as synonyms. To me, flagging is marking stuff (eg for a higher authority to review), downvoting is dealing with stuff by my resources.

So no morality reference. I just say the gamification can be more challenging if passing captcha gives players a bonus (in case I am sued: To give an incentive for bot makers to improve the AI not to discriminate against bots).

Giving regular users access to a bot that helps with vote timing would be another improvement on its own - thats what I usually say when people complain about bots in gaming.

Removing the competition comes with a price. It might be a good deal under some conditions.

The idea of incentivising late voting sounds great but in the end all it can do is to move the bot playground from the first five minutes to the last five minutes.

Yes, that is exactly what we need to do (imho). The current 'problem' with bots is that they accumulate dozens of votes at the 5-minute mark for a post that might be complete garbage, but happened to come from someone who had some good previous posts. In other words, they add noise to manual curation, thus decreasing the signal-to-noise ratio. Delayed auto-votes can amplify the signal (because they have time to evaluate and filter noise from the genuine signal). In that sense, the bots become a resource rather than a liability.

If the bot playground is in the last 5 minutes (or the midpoint of the voting period, as I've suggested), then the bots actually have some really valuable information available to them, because the bots can evaluate the voting patterns of everyone who voted before them (and even cross-evaluating how they voted for others).

If you move the sweet spot elsewhere, everyone playing optimally is going to withhold their vote till the sweet spot so any information available to bots is coming from poisoned wells.

OK, in the real world not everyone can adjust so shortterm you can mine some useful info from good curators. Longterm, they are going to leave when their rewards go down.

Trying to create multiple sweet spots is just an effort to make the whole process of wearing the manual curators down last a little longer.

My reference point is not the current system, btw (community agreed it is flawed months ago - not necessarily saying there is a consensus on improvements). My reference point is the system where curators and bots share the first place between everyone voting in the first N hours. In lab conditions (everyone optimal) it boils down to linear (LEO) with shorter voting (no one bothers to vote outside window). In the real world it allows both camps to cash in any advantage the casual players leave on the table by voting late (to be nice and send the rewards to authors that deserve them).

There is no extra df, you introduced an extra pool (df:=df+1) but also a new rule that you can only vote in one of those (df:=df-1)

I would say there is, because someone can split their vote via separate accounts.

Also, I am not explicitly saying the delayed auto-vote plays no role in the early-vote pool. Although I didn't explicitly lay out fine details (because I wanted to focus discussion on the big-picture concept), my current thinking would be that the delayed auto-vote percentage might be around 40% (slightly less than a linear reward of 50%) with 10% going to the early-vote curation pool. That 10% would then be subject to manipulation by delayed-auto-voting bots (by splitting votes across two accounts, one voting early one voting later).

Fair point, I missed that.
So I gave it a lot of thought.

If you decide to split your power you need a small vote early (curation) and big vote late (upvote) - you cant manipulate stuff the other way around.

Dream scenario is you curate a post, noone else does, you will have 10%. What is your play? Do you wait for others to vote "knowing" you drop a fat upvote later? That means getting your big money in when way behind (too late and into weakened pool). In practice, you just drop the upvote fast and hope to go big from there (worst case: noone else votes and you get average score) or refrain from upvoting at all - no late voting either way. (Well, maybe you find it profitable just before payout if noone upvotes but the late vote hammer didnt do any good for the post)

Usually a few others curate with you. Now it comes down to judging whether the total payout makes it high enough to give good return on upvote. Your share of the curation pool is only going to matter if you estimate the decision to be close.

So you pass up on upvoting some of your curations. What do you do with these votes? Find something you considered curating and passed up? That only makes sense if it is dead zero in curation (wont happen). Unfortunately, your best play is to find stuff that was overcurated (and upvote it asap - you either get early on a viral post or you get huge share of the 40% pool when few others upvote - at curators' expense). Maybe you will be able to find some overcurated-underupvoted stuff near payouts too - not the kind of late upvotes to be useful (again).

To me, insta-upvoting highly curated stuff seems to be the most profitable option so the bots are going that way (curation not being necessary). Fast forward few weeks, noone is splitting their votes. There are upvoting bots crushing the regular users in the upvote pool because they vote faster (so the current fast voting penalty has to be reintroduced for the upvote pool). Meanwhile, in the curation pool, the best results are produced by curation bots throwing around lots of small votes on questionable posts fishing for scooping full (or maybe half/third) curation pool if the post happens to get some upvotes from FFF crowd (Friends/Fools/Family).

Even if I am wrong about the black-hat curation strategy, curation bots have a great advantage over manual curators because they can always make their decision at the very end of the curation window, acting on much better info about the amount of votes during curation period (way more important piece of info compared to the system I referenced)

So in my mind experiment, the artificial split introduces a df almost noone uses and it still breaks the system by incetivising destructive behaviour on both sides of the pool.

Dream scenario is you curate a post, noone else does, you will have 10%

Thanks for the thoughtful analysis.

If I understand your analysis correctly, the biggest manipulation via self-playing with split votes is 50% curation reward instead of 40%, which is same as linear reward. If they want to go that route they can just early and late vote random comments and snag the 50%. Maybe that's not a bad thing? That being the case, police bots could easily search out those and dilute them. Or maybe let whales forfeit voting rights in exchange for 50% auto reward and 50% spread across all author rewards.

Some will still try to game other people's votes, but it seems to me that the incentive to do that goes down (relative to current protocol) because the late voting bots are contributing only 10% to the nonlinear pool instead of 50%.

What keeps a large stakeholder (today) from authoring a random comment via a separate account then posting an early vote from yet a different account, then posting a large last minute upvote.

Wouldn’t that gain the person 100% of his voted rewards? And couldn’t that all be done by a bot?

the biggest manipulation via self-playing with split votes is 50% curation reward instead of 40%, which is same as linear reward.

I do not understand this.

My conclusion was there is no effective split-vote abuse mechanics but the system incentivises unwanted behaviour (looking for spots where 5:01 vote gives more rewards than 4:59 - this cannot happen without a priori split of rewards between the two pools).

What keeps a large stakeholder (today) from authoring a random comment via a separate account then posting an early vote from yet a different account, then posting a large last minute upvote.

Lack of incentive - they are better off dropping the big upvote after 5 minutes.

Unless you assume they are afraid of being downvoted. That is why the new system is expected to have a window before payouts (like 1 day) when upvotes are no longer possible and only downvotes can happen.

Asking people to be nice has been tried before. It does not work on chain-wide level.

I am not suggesting that we 'ask people to be nice'. Rather, I am suggesting that we change the rules so that the differential in payout from 'gaming the system' is not much greater than 'playing nice'.

The other advantage of incentivizing delayed-voting bots is that those bots can play a significant role in flagging and 'calling out' those that aren't 'playing nice', ultimately leading to the shunning and thus exclusion of such 'bad' behavior. This power is accentuated if content creators are allowed to reject upvotes they suspect are being made in bad faith, thus forcing the 'offending bots' to merely play by themselves, in their own sandboxes (amplifying their own rewards slightly, but not adding noise to otherwise valuable curation efforts).

Allowing content creators to reject "bad" upvotes is a passive-agressive way of telling them to be nice.