Part 4/11:
Group Confidence: It aggregates token confidences over larger segments (such as sliding windows of 48 tokens), to smooth out fluctuations and better understand overall trustworthiness.
Tail Confidence: Focuses on the most recent portion of reasoning, where the final answer emerges and errors are common.
Weakest and Percentile Confidence: Identifies the least confident parts of a reasoning path and highlights the most problematic sections, effectively providing a "health report" for each solution trace.
Practical Application of Confidence
Offline Mode: After generating multiple solution traces, Deep Comp filters out low-confidence paths and relies on the strongest to decide the answer.