You are viewing a single comment's thread from:

RE: LeoThread 2025-10-19 16-17

in LeoFinance2 months ago

Part 3/9:

This process leads to what the speaker describes as "noisy" estimations. Every individual step in a solution trajectory—whether correct or incorrect—is upweighted if the final result is correct. This approach assumes that all parts of the process that led to success are equally valuable, which is rarely true in human reasoning. Humans, in contrast, review their thought processes, analyze why certain steps worked, and adapt accordingly. RL, however, treats the pathway as a black box, disregarding the potential for missteps or misguided attempts along the way.

The Supervision Strain and Human Discrepancies