Part 3/9:
This process leads to what the speaker describes as "noisy" estimations. Every individual step in a solution trajectory—whether correct or incorrect—is upweighted if the final result is correct. This approach assumes that all parts of the process that led to success are equally valuable, which is rarely true in human reasoning. Humans, in contrast, review their thought processes, analyze why certain steps worked, and adapt accordingly. RL, however, treats the pathway as a black box, disregarding the potential for missteps or misguided attempts along the way.