icon of LM Arena (Chatbot Arena)

LM Arena (Chatbot Arena)

Open-source, community-driven platform for live benchmarking and evaluation of large language models (LLMs) using crowdsourced pairwise comparisons and Elo ratings.

image for LM Arena (Chatbot Arena)

Product Overview

What is LM Arena (Chatbot Arena)?

LM Arena, also known as Chatbot Arena, is an open-source platform developed by LMSYS and UC Berkeley SkyLab to advance the development and understanding of large language models through live, transparent, and community-driven evaluations. It enables users to interact with and compare multiple LLMs side-by-side in anonymous battles, collecting votes to rank models using the Elo rating system. The platform supports a wide range of publicly released models, including both open-weight and commercial APIs, and continuously updates its leaderboard based on real-world user feedback. LM Arena emphasizes transparency, open science, and collaboration by sharing datasets, evaluation tools, and infrastructure openly on GitHub.


Key Features

  • Crowdsourced Pairwise Model Comparison

    Users engage in anonymous, randomized battles between two LLMs, voting on the better response to generate reliable comparative data.

  • Elo Rating System for Model Ranking

    Adopts the widely recognized Elo rating system to provide dynamic, statistically sound rankings of LLM performance.

  • Open-Source Infrastructure

    All platform components including frontend, backend, evaluation pipelines, and ranking algorithms are open source and publicly available.

  • Live and Continuous Evaluation

    Real-time collection of user prompts and votes ensures up-to-date benchmarking reflecting current model capabilities and real-world use cases.

  • Support for Publicly Released Models

    Includes models that are open-weight, publicly accessible via APIs, or available as services, ensuring transparency and reproducibility.

  • Community Engagement and Transparency

    Encourages broad participation and openly shares user preference data and prompts to foster collaborative AI research.


Use Cases

  • LLM Performance Benchmarking : Researchers and developers can evaluate and compare the effectiveness of various large language models under real-world conditions.
  • Model Selection for Deployment : Organizations can identify the best-performing LLMs for their specific applications by reviewing live community-driven rankings.
  • Open Science and Research : Academics and AI practitioners can access shared datasets and tools to conduct reproducible research and improve model development.
  • Community Feedback for Model Improvement : Model providers can gather anonymized user feedback and voting data to refine and enhance their AI systems before official releases.

FAQs

LM Arena (Chatbot Arena) Alternatives

๐Ÿš€
icon

RunPod

A cloud computing platform optimized for AI workloads, offering scalable GPU resources for training, fine-tuning, and deploying AI models.

โ™จ๏ธ 1.84M๐Ÿ‡บ๐Ÿ‡ธ 21.17%
Paid
icon

Geekbench

A cross-platform benchmarking tool measuring CPU and GPU performance across various devices and operating systems.

โ™จ๏ธ 1.01M๐Ÿ‡บ๐Ÿ‡ธ 18.86%
Paid
icon

Ballpark

A user research platform that simplifies capturing high-quality feedback on product ideas, marketing copy, designs, and prototypes with versatile testing methods and rich media insights.

โ™จ๏ธ 260.12K๐Ÿ‡บ๐Ÿ‡ธ 52.4%
Freemium

Opal by Google

A toolkit for developers to test, evaluate, and implement safety measures for large language model applications.

โ™จ๏ธ 249.9K๐Ÿ‡ธ๐Ÿ‡ณ 7.99%
Free
icon

Sakana AI

Tokyo-based AI research company pioneering nature-inspired foundation models and fully automated AI-driven scientific discovery.

โ™จ๏ธ 165.6K๐Ÿ‡ฏ๐Ÿ‡ต 34.8%
Paid
icon

Userbrain

Unmoderated remote user testing platform streamlining UX research through a global tester pool and automated analysis tools.

โ™จ๏ธ 152.7K๐Ÿ‡บ๐Ÿ‡ธ 43.22%
Free Trial
icon

MindSpore

An all-scenario, open-source deep learning framework designed for easy development, efficient execution, and unified deployment across cloud, edge, and device environments.

โ™จ๏ธ 105.1K๐Ÿ‡จ๐Ÿ‡ณ 54.35%
Free

ๆ— ้—ฎ่Šฏ็ฉน

Enterprise-grade heterogeneous computing platform enabling efficient deployment of large models across diverse chip architectures.

โ™จ๏ธ 65.81K๐Ÿ‡จ๐Ÿ‡ณ 62.45%
Paid

Analytics of LM Arena (Chatbot Arena) Website

LM Arena (Chatbot Arena) Traffic & Rankings
19.66M
Monthly Visits
00:09:35
Avg. Visit Duration
50
Category Rank
0.31%
User Bounce Rate
Traffic Trends: Sep 2025 - Nov 2025
Top Regions of LM Arena (Chatbot Arena)
  1. ๐Ÿ‡ฎ๐Ÿ‡ณ IN: 11.46%

  2. ๐Ÿ‡บ๐Ÿ‡ธ US: 9.9%

  3. ๐Ÿ‡ท๐Ÿ‡บ RU: 8.99%

  4. ๐Ÿ‡จ๐Ÿ‡ณ CN: 5.15%

  5. ๐Ÿ‡ฐ๐Ÿ‡ท KR: 4.12%

  6. Others: 60.38%