Part 4/8:
Performance Metrics and Speed Enhancements
The new model, Mercury, is clocked at an impressive rate of over 100,000 tokens per second using standard hardware, such as the NVIDIA H100 chip. This speed starkly contrasts with that of traditional models, which typically operate at around 40 to 60 tokens per second. Such a drastic improvement means that users can now receive answers much quicker—potentially within seconds compared to the minutes or even longer that may have been required previously.
By leveraging speed, these models maximize test time compute efficiency, unlocking the ability to handle complex tasks with greater agility. The speed at which Mercury operates opens up exciting possibilities for applications, particularly in areas like code generation.