Grok 3 vs ChatGPT, DeepSeek, and other AI competitors

in #ailast month

image.png

Elon Musk’s xAI has officially launched Grok 3, its latest large language model (LLM). The launch, announced via livestream on X (formerly Twitter), also introduced beta versions of “Grok 3 Reasoning” and “Grok 3 mini Reasoning.” These “Reasoning” models are a step beyond standard generative AI like GPT-4, as they aim to tackle problems more analytically, potentially reducing the common issue of AI “hallucinations” (fabricating information).


xAI is making bold claims, positioning Grok 3 as the top performer, exceeding the capabilities of models from OpenAI, Google, Anthropic, and DeepSeek in benchmark tests.

Grok 3, under its codename “chocolate,” showed promising results in Chatbot Arena, a platform where chatbots are anonymously compared.

While Grok 3 has made significant progress, especially considering its relatively recent entry into the field, it still shares some of the limitations observed in other leading-edge LLMs. Experts in the AI field are now weighing in on this newcomer.

Grok 3’s “late start” and subsequent rapid progress raise questions about xAI’s development strategy. What specific innovations or approaches has xAI employed to catch up so quickly? Has access to different data or infrastructure played a role?


Grok 3 is competitive but not compelling enough to replace ChatGPT

Andrej Karpathy, a prominent figure in the AI world with experience at both OpenAI and Tesla, has offered his initial assessment of Grok 3. He conducted some standard tests and concluded that Grok 3, particularly with its new “Deep Search” reasoning capability, performs at a level comparable to OpenAI’s most powerful models (specifically mentioning the o1-pro, which costs $200/month), and even slightly surpasses DeepSeek-R1 and Gemini 2.0 Flash Thinking.

This news has understandably excited supporters of Elon Musk and xAI. However, the question remains whether Grok 3’s performance is compelling enough to sway those who aren’t necessarily invested in the Musk ecosystem.

Ethan Mollick, an AI professor at Wharton, suggests that Grok 3’s performance aligns with expectations and doesn’t drastically alter the overall landscape of AI development. He emphasizes the importance of rapid development, computational resources, and talent, while suggesting that there’s no single magic formula for creating a leading-edge AI model.


xAI conspicuously avoided a key comparison for Grok 3

Screenshots showcasing Grok 3 Reasoning models surpassing OpenAI’s o3 mini and o1, DeepSeek’s R1, and Google’s Gemini 2.0 Flash Thinking in benchmark tests have quickly spread online, fueling claims that Grok 3 represents the pinnacle of reasoning capabilities in AI.

However, OpenAI has countered these claims. Shortly after the benchmarks were shared during a livestream, OpenAI product engineer Rex Asabor published a revised chart demonstrating o3 outperforming Grok 3 Reasoning in math and science. While acknowledging that o3 is not yet publicly available and therefore xAI may not have had access to these scores, this rebuttal from OpenAI serves to temper the enthusiasm of Grok’s most ardent supporters who believe OpenAI is no longer a leading force in AI.

This is a crucial point. If o3 is not publicly available, how can xAI be fairly compared to it? It seems like comparing apples and oranges. Does OpenAI’s response suggest a competitive strategy of preemptively countering claims before a product is even released? This also raises the issue of access. Does OpenAI have access to data or benchmarks that xAI does not? Fair comparisons require equal access to information.


All things considered, Grok 3’s quick rise is quite impressive

Ethan Mollick, in a series of posts on X (formerly Twitter), emphasized the speed of Grok 3’s development, calling it “a very good model that is now at the frontier.” He highlighted the fact that xAI achieved this level of performance much faster than established players like Google and OpenAI, who have been working on similar models for 13 and 8 years, respectively, compared to xAI’s founding in 2023.

Elon Musk revealed that Grok 3 was trained using significantly more computational resources than its predecessor, Grok 2, specifically mentioning a tenfold increase and the use of 200,000 GPUs. This, according to Mollick, supports the idea of scaling laws in AI: more compute power generally translates to better model performance, at least in the near term.

Will xAI take a more open or closed approach to sharing Grok 3? Open access can accelerate progress but also raises safety concerns. What is xAI’s philosophy on this front?

However, AI researcher Gary Marcus expresses skepticism about the long-term applicability of scaling laws, questioning whether simply increasing compute will indefinitely lead to higher levels of intelligence.

Just imagine when models like Grok 3 could produce media like Trump’s Gaza Video for cents compare to its current competitor. That’s exciting and scary at the same time!


Grok 3 has the same limitations as other models.

Like other large language models, Grok 3’s attempts at humor fall flat, mostly limited to predictable dad jokes. This, according to Andrej Karpathy, is a common problem with LLMs, indicative of “mode collapse” in humor generation. Grok 3 also struggles with generating SVG images, a task that often challenges LLMs due to their inability to “see” and arrange elements in two-dimensional space like humans.

While Grok 3 performed reasonably well compared to some other models (specifically mentioning Gemini 1.5 Flash), it didn’t perfectly execute Karpathy’s prompt to create an SVG of a pelican riding a bicycle. Finally, Karpathy tested Grok 3’s stance on politically sensitive issues. He found that Grok 3 hesitated to answer a hypothetical ethical dilemma involving misgendering someone to save lives, suggesting it might be overly cautious in such situations, potentially contrary to Elon Musk’s stated goal of making Grok less “woke.”

Past Grok models have tended to lean left politically, which Musk attributes to the bias in the training data, and he has pledged to make Grok more politically neutral.


If you liked this article I'd appreciate an upvote or a comment. That helps me improve the quality of my posts as well as getting to know more about you, my dear reader.
Muchas gracias!
Follow me for more content like this.
X | PeakD | Rumble | YouTube | Linked In | GitHub | PayPal.me

Down below you can find other ways to tip my work.

BankTransfer: "710969000019398639", // CLABE
BAT: "0x33CD7770d3235F97e5A8a96D5F21766DbB08c875",
ETH: "0x33CD7770d3235F97e5A8a96D5F21766DbB08c875",
BTC: "33xxUWU5kjcPk1Kr9ucn9tQXd2DbQ1b9tE",
ADA: "addr1q9l3y73e82hhwfr49eu0fkjw34w9s406wnln7rk9m4ky5fag8akgnwf3y4r2uzqf00rw0pvsucql0pqkzag5n450facq8vwr5e",
DOT: "1rRDzfMLPi88RixTeVc2beA5h2Q3z1K1Uk3kqqyej7nWPNf",
DOGE: "DRph8GEwGccvBWCe4wEQsWsTvQvsEH4QKH",
DAI: "0x33CD7770d3235F97e5A8a96D5F21766DbB08c875"