OpenAI Employees Accuse xAI of Misleading Grok3 Benchmark Results

Flash

February 23, 2025 11:41 AM

In Brief:
OpenAI employees claim xAI’s Grok3 benchmark comparisons are misleading.
xAI co-founder Igor Babushkin defends the company’s methodology.

A recent dispute has emerged between OpenAI and Elon Musk’s xAI over benchmark test results for Grok3. OpenAI employees accused xAI of presenting misleading charts that compare Grok3’s performance against OpenAI’s o3-mini-high model on the AIME 2025 benchmark.

The controversy stems from xAI’s decision to exclude o3-mini-high’s score under the "cons@64" condition, which OpenAI claims skews the comparison. In response, xAI’s co-founder Igor Babushkin defended the company’s methodology, stating that OpenAI has previously published similar selective benchmark comparisons.

As competition intensifies in the AI space, transparency in model evaluation remains a key issue, with both companies vying for dominance in AI benchmarks.

Disclaimer: Backdoor provides informational content only, it is not offered or intended to be used as legal, tax, investment, financial, or other advice. Investments in digital assets involve risk, and past performance does not guarantee future results. We recommend conducting your own research before making any investment decisions.