Gemma 4 and Phi-4: Google and Microsoft in Open Source

When talking about open source AI, the conversation usually revolves around Meta, DeepSeek, and Alibaba. But Google and Microsoft also have serious bets in this field — and they're reaping impressive results.

Gemma 4 and Phi-4 represent opposite philosophies: one bets on multiple sizes with a focus on efficiency; the other proves that small models can outperform giants in specific tasks.

Gemma 4: Google Gets in Shape

Launched on April 2, 2026, Gemma 4 was released by Google DeepMind with a family of four sizes. The highlight is the 31 billion parameter model, which achieved the 3rd position globally in Chatbot Arena — the most respected leaderboard for human preference.

The numbers are impressive for a medium-sized model:

AIME 2026 (competitive mathematics): 89.2% — frontier-level result
GPQA Diamond (doctoral-level scientific reasoning): 84.3%
LiveCodeBench v6 (real coding): 80.0%

The 26 billion model occupies the 6th position on the same leaderboard. Two models from the same family in the top 10 worldwide is a rarely seen achievement.

What Makes Gemma 4 Different?

Gemma 4 was trained with techniques derived from Gemini — Google's cutting-edge proprietary model. The knowledge transfer between generations is clear in the results.

The license is Apache 2.0 — fully permissive for commercial use, without restrictions. It's available via Hugging Face, Vertex AI, and Google AI Studio.

For teams that need a robust, auditable model running on their own infrastructure, Gemma 4 31B is today one of the strongest options available.

Phi-4: Small, Smart, Surprising

Microsoft went in the opposite direction. Phi-4 has only 14 billion parameters — a size that easily fits on a single consumer GPU.

The project's premise is clear: data quality surpasses parameter quantity.

Phi-4 was trained with high-quality synthetic data, filtered academic content, and curated datasets. The result is a model that performs above expectations for its size:

Complex logical reasoning: competitive with 70B+ models on specific benchmarks
Context window: 16K tokens (Phi-4-mini, a smaller 3.8B variant, offers 128K)
Memory usage: efficient for its size, viable on consumer hardware

For those who need to run a model locally — on a cutting-edge laptop or mid-range server — Phi-4 is one of the few options that delivers quality reasoning without requiring heavy infrastructure.

Two Models, Two Use Cases

Gemma 4 is for those who need maximum performance in controlled corporate environments with medium-sized models. Phi-4 is for those who need solid reasoning on limited hardware — edge computing, local devices, embedded applications.

Together, they show that Google and Microsoft take open source seriously not as charity, but as strategy: developers who adopt these models tend to run workloads on the respective companies' clouds.

Conclusion

Gemma 4 and Phi-4 prove that size isn't everything. With the right training techniques and quality data, medium and small-sized models can compete with giants.

For solution architects who need to balance cost, privacy, and performance, these two families deliver concrete and viable options.

Sources:

Published on Hive.blog | #ArtificialInteligence #llm