Sort:  

During skim-reading, eyes move in saccades and the brain tokenizes chunks of words together; speed-reading lowers recall accuracy while preserving general comprehension.

This shifts processing from "word tokens" to "idea tokens" — the source of the roughly 10x speed-up

DeepSeek-OCR is presented as an engineering achievement.
It’s been suspected that VLM/OCR models can be much smaller; pre-VLM state-of-the-art OCR (e.g., Google Cloud OCR) was likely under ~100M parameters.

Recently, small open-weight models have matched or surpassed closed-source options on many benchmarks.
Notably, dots.ocr’s 1.7B model often outperforms OpenAI/Anthropic and sometimes Gemini, at a fraction of the cost.

OCR is largely a pattern-recognition task that requires little internal memory, which explains why a lightweight 12-layer architecture can work well.

DeepSeek-OCR advances two areas: a "small" mixture-of-experts approach with only ~500M active parameters to process large batches, and aggressive encoding with semantic pooling.

This resembles tokenizerless models where the encoder composes signals into higher-level units, speeding up inference alongside other optimizations.

However, the model is unlikely to radically shift OCR research yet: the training pipeline used synthetic/emulation data with limited diversity, especially for graph rendering.

More technical development will be needed, but DeepSeek-OCR could serve as a foundational OCR model that nails the inference/performance trade-off while still requiring custom annotation and processes for large-scale use