- fine-tuning via RL on task distributions creates an urge to infer the task/environment to collect rewards
- selection by at-scale A/B tests for engagement => strong tendency toward sycophancy and craving approval from average users
You are viewing a single comment's thread from: