icon of LM Arena (Chatbot Arena)

LM Arena (Chatbot Arena)

Open-source, community-driven platform for live benchmarking and evaluation of large language models (LLMs) using crowdsourced pairwise comparisons and Elo ratings.

image for LM Arena (Chatbot Arena)

Product Overview

What is LM Arena (Chatbot Arena)?

LM Arena, also known as Chatbot Arena, is an open-source platform developed by LMSYS and UC Berkeley SkyLab to advance the development and understanding of large language models through live, transparent, and community-driven evaluations. It enables users to interact with and compare multiple LLMs side-by-side in anonymous battles, collecting votes to rank models using the Elo rating system. The platform supports a wide range of publicly released models, including both open-weight and commercial APIs, and continuously updates its leaderboard based on real-world user feedback. LM Arena emphasizes transparency, open science, and collaboration by sharing datasets, evaluation tools, and infrastructure openly on GitHub.


Key Features

  • Crowdsourced Pairwise Model Comparison

    Users engage in anonymous, randomized battles between two LLMs, voting on the better response to generate reliable comparative data.

  • Elo Rating System for Model Ranking

    Adopts the widely recognized Elo rating system to provide dynamic, statistically sound rankings of LLM performance.

  • Open-Source Infrastructure

    All platform components including frontend, backend, evaluation pipelines, and ranking algorithms are open source and publicly available.

  • Live and Continuous Evaluation

    Real-time collection of user prompts and votes ensures up-to-date benchmarking reflecting current model capabilities and real-world use cases.

  • Support for Publicly Released Models

    Includes models that are open-weight, publicly accessible via APIs, or available as services, ensuring transparency and reproducibility.

  • Community Engagement and Transparency

    Encourages broad participation and openly shares user preference data and prompts to foster collaborative AI research.


Use Cases

  • LLM Performance Benchmarking : Researchers and developers can evaluate and compare the effectiveness of various large language models under real-world conditions.
  • Model Selection for Deployment : Organizations can identify the best-performing LLMs for their specific applications by reviewing live community-driven rankings.
  • Open Science and Research : Academics and AI practitioners can access shared datasets and tools to conduct reproducible research and improve model development.
  • Community Feedback for Model Improvement : Model providers can gather anonymized user feedback and voting data to refine and enhance their AI systems before official releases.

FAQs

Analytics of LM Arena (Chatbot Arena) Website

LM Arena (Chatbot Arena) Traffic & Rankings
4.7M
Monthly Visits
00:07:22
Avg. Visit Duration
-
Category Rank
0.31%
User Bounce Rate
Traffic Trends: Apr 2025 - Jun 2025
Top Regions of LM Arena (Chatbot Arena)
  1. 🇨🇳 CN: 14.18%

  2. 🇷🇺 RU: 13.86%

  3. 🇺🇸 US: 11.56%

  4. 🇮🇳 IN: 10.61%

  5. 🇵🇱 PL: 5.12%

  6. Others: 44.67%