Part 5/10:
OpenAI conducted rigorous red team testing and safety evaluations to assess vulnerabilities. In one notable metric, GPT-1 Preview scored 84 out of 100 on jailbreaking resistance—an improvement over GPT-4's score of 22—indicating a substantially increased ability to resist attempts to generate harmful or disallowed content.
Safety Performance Metrics
Achieved 93.4% in the model's ability to refuse unsafe prompts, compared to GPT-4's 71.3%
Hallucination rate (generation of incorrect facts): 44%, a significant reduction from GPT-4's 61%, though anecdotal evidence suggests the model can sometimes produce more convincing yet false information