Part 4/16:
Driving this functionality is DataForge, a highly creative data pipeline system. Rather than scraping vast amounts of internet text—an often noisy and unstructured process—DataForge constructs synthetic training data. It models a graph where each node applies transformations—using rules from planning domain languages—to create diverse and logically consistent examples. For instance, a Wikipedia article can be reimagined as a rap song, then broken down into instructions, answers, and reasoning traces. Over time, Hermes 4 was trained on 5 million such samples, amounting to a staggering 19 billion tokens, with reasoning sequences averaging five times longer than typical datasets (up to 16,000 tokens). This focus on detailed, long-form reasoning enables Hermes 4 to carry complex thought processes without losing track.