Part 6/8:
Following a rigorous pre-training phase, which involved parsing 7.5 trillion tokens of diverse data—primarily STEM-focused—the model underwent supervised training with a curriculum reflecting human learning patterns. It addressed real-world logical and programming problems by employing a unique training environment simulating actual coding scenarios.
Performance and Benchmarking
When it comes to performance, the results are impressive. M1 achieves an accuracy of 86% on AM 2024—a benchmark score close to that of Deepseek R10528, proving its competitiveness against leading models. In other assessments like the AM 2025 and Math 500 sets, M1 consistently performs at or above the average of existing open-source models.