It comes down to discipline and self-restraint from the team (who face strong incentives otherwise) to avoid overfitting test sets via elaborate gymnastics around test-set–adjacent data in the document-embedding space.
You are viewing a single comment's thread from: