In this video, I dive into the controversy surrounding the Leaderboard Illusion paper and what it reveals about systematic flaws in LLM benchmarks—especially Chatbot Arena. As someone who’s followed the evolution of these leaderboards closely, I was shocked by the extent of data access disparities and selective reporting. This is a wake-up call for the entire AI community.
Home
>
artificial
teknologi
training
>
LLM Benchmarks Are Broken—The Leaderboard Illusion
LLM Benchmarks Are Broken—The Leaderboard Illusion
Posted by
adijaya
on
May 01, 2025