RE: LeoThread 2025-11-09 20-32

Part 4/11:

Group Confidence: It aggregates token confidences over larger segments (such as sliding windows of 48 tokens), to smooth out fluctuations and better understand overall trustworthiness.
Tail Confidence: Focuses on the most recent portion of reasoning, where the final answer emerges and errors are common.
Weakest and Percentile Confidence: Identifies the least confident parts of a reasoning path and highlights the most problematic sections, effectively providing a "health report" for each solution trace.

Practical Application of Confidence

Offline Mode: After generating multiple solution traces, Deep Comp filters out low-confidence paths and relies on the strongest to decide the answer.