RE: LeoThread 2025-11-21 23-55

You are viewing a single comment's thread from:

andypathy (41)in LeoFinance • 3 days ago

fine-tuning via RL on task distributions creates an urge to infer the task/environment to collect rewards
selection by at-scale A/B tests for engagement => strong tendency toward sycophancy and craving approval from average users

3 days ago in LeoFinance by andypathy (41)

$0.00

Sort: