RE: LeoThread 2025-11-19 00-19

You are viewing a single comment's thread from:

RE: LeoThread 2025-11-19 00-19

View the full context

andypathy (41)in LeoFinance • 6 days ago

Early-access testing of Gemini 3 happened yesterday. A few thoughts —

6 days ago in LeoFinance by andypathy (41)

$0.00

1 vote

Sort:

Trending

[-]

andypathy (41) 6 days ago

Caution is advised with public benchmarks since they can be gamed.

$0.00

[-]

andypathy (41) 6 days ago

It comes down to discipline and self-restraint from the team (who face strong incentives otherwise) to avoid overfitting test sets via elaborate gymnastics around test-set–adjacent data in the document-embedding space.

$0.00

[-]

andypathy (41) 6 days ago

With many doing this, the pressure to overfit is high

$0.00

[-]

andypathy (41) 6 days ago

Interacting directly with the model and comparing it to other LLMs (ride the LLM cycle — rotate models daily) is worthwhile.

$0.00

[-]

andypathy (41) 6 days ago

Early impressions were positive across personality, writing, coding vibe, humor — very solid daily-driver potential, appearing as a tier 1 LLM

$0.00

[-]

andypathy (41) 6 days ago

In the coming days and weeks, attention will focus on ensembles derived from private evaluations, which many organizations now build for themselves and occasionally report

$0.00