LM Arena (Chatbot Arena)
Open-source, community-driven platform for live benchmarking and evaluation of large language models (LLMs) using crowdsourced pairwise comparisons and Elo ratings.
Product Overview
What is LM Arena (Chatbot Arena)?
LM Arena, also known as Chatbot Arena, is an open-source platform developed by LMSYS and UC Berkeley SkyLab to advance the development and understanding of large language models through live, transparent, and community-driven evaluations. It enables users to interact with and compare multiple LLMs side-by-side in anonymous battles, collecting votes to rank models using the Elo rating system. The platform supports a wide range of publicly released models, including both open-weight and commercial APIs, and continuously updates its leaderboard based on real-world user feedback. LM Arena emphasizes transparency, open science, and collaboration by sharing datasets, evaluation tools, and infrastructure openly on GitHub.
Key Features
Crowdsourced Pairwise Model Comparison
Users engage in anonymous, randomized battles between two LLMs, voting on the better response to generate reliable comparative data.
Elo Rating System for Model Ranking
Adopts the widely recognized Elo rating system to provide dynamic, statistically sound rankings of LLM performance.
Open-Source Infrastructure
All platform components including frontend, backend, evaluation pipelines, and ranking algorithms are open source and publicly available.
Live and Continuous Evaluation
Real-time collection of user prompts and votes ensures up-to-date benchmarking reflecting current model capabilities and real-world use cases.
Support for Publicly Released Models
Includes models that are open-weight, publicly accessible via APIs, or available as services, ensuring transparency and reproducibility.
Community Engagement and Transparency
Encourages broad participation and openly shares user preference data and prompts to foster collaborative AI research.
Use Cases
- LLM Performance Benchmarking : Researchers and developers can evaluate and compare the effectiveness of various large language models under real-world conditions.
- Model Selection for Deployment : Organizations can identify the best-performing LLMs for their specific applications by reviewing live community-driven rankings.
- Open Science and Research : Academics and AI practitioners can access shared datasets and tools to conduct reproducible research and improve model development.
- Community Feedback for Model Improvement : Model providers can gather anonymized user feedback and voting data to refine and enhance their AI systems before official releases.
FAQs
LM Arena (Chatbot Arena) Alternatives

Nous Research
A pioneering AI research collective focused on open-source, human-centric language models and decentralized AI infrastructure.
AnythingLLM
All-in-one AI desktop application offering local and cloud LLM usage, document chat, AI agents, and full privacy with zero setup.

Allen Institute for AI (AI2)
A nonprofit research institute advancing AI through open-source models, tools, and scientific literature search solutions.

Pathway
A modern UX research platform enabling product teams to rapidly validate designs with real users worldwide through smart, unmoderated tests and AI-driven insights.

Pulse Labs
AI-driven platform providing high-quality user feedback, data collection, and model testing to optimize product and AI development.

Prompt Cowboy
Prompt generation tool that transforms rough ideas into structured, high-performing prompts for ChatGPT, Claude, and other language models.
Analytics of LM Arena (Chatbot Arena) Website
🇨🇳 CN: 14.18%
🇷🇺 RU: 13.86%
🇺🇸 US: 11.56%
🇮🇳 IN: 10.61%
🇵🇱 PL: 5.12%
Others: 44.67%