You are viewing a single comment's thread from:

RE: LeoThread 2025-05-10 11:48

in LeoFinance5 months ago

Part 2/10:

In training large language models, the process typically begins with pre-training using vast amounts of data, which is then followed by alignment or fine-tuning. The fine-tuning can be done through Supervised Fine Tuning (SFT), where models learn from human-curated data, and through Reinforcement Learning (RL), where feedback loops (praise or disapproval) are utilized to guide the model toward preferred behaviors.

The limitations of human data curation present a bottleneck in the rapid training of these models. The challenge lies in developing approaches that rely less on human data, giving rise to the concepts presented in the paper.

Absolute Zero Concept: Self-Improving AI