LM Arena (Chatbot Arena)

Open-source, community-driven platform for live benchmarking and evaluation of large language models (LLMs) using crowdsourced pairwise comparisons and Elo ratings.

AI Testing & QA Research Tools AI Developer Tools Large Language Models (LLMs)

Visit Website

Overview
Alternatives
Analytics

Product Overview

What is LM Arena (Chatbot Arena)?

LM Arena, also known as Chatbot Arena, is an open-source platform developed by LMSYS and UC Berkeley SkyLab to advance the development and understanding of large language models through live, transparent, and community-driven evaluations. It enables users to interact with and compare multiple LLMs side-by-side in anonymous battles, collecting votes to rank models using the Elo rating system. The platform supports a wide range of publicly released models, including both open-weight and commercial APIs, and continuously updates its leaderboard based on real-world user feedback. LM Arena emphasizes transparency, open science, and collaboration by sharing datasets, evaluation tools, and infrastructure openly on GitHub.

Key Features

Crowdsourced Pairwise Model Comparison
Users engage in anonymous, randomized battles between two LLMs, voting on the better response to generate reliable comparative data.
Elo Rating System for Model Ranking
Adopts the widely recognized Elo rating system to provide dynamic, statistically sound rankings of LLM performance.
Open-Source Infrastructure
All platform components including frontend, backend, evaluation pipelines, and ranking algorithms are open source and publicly available.
Live and Continuous Evaluation
Real-time collection of user prompts and votes ensures up-to-date benchmarking reflecting current model capabilities and real-world use cases.
Support for Publicly Released Models
Includes models that are open-weight, publicly accessible via APIs, or available as services, ensuring transparency and reproducibility.
Community Engagement and Transparency
Encourages broad participation and openly shares user preference data and prompts to foster collaborative AI research.

Use Cases

LLM Performance Benchmarking : Researchers and developers can evaluate and compare the effectiveness of various large language models under real-world conditions.
Model Selection for Deployment : Organizations can identify the best-performing LLMs for their specific applications by reviewing live community-driven rankings.
Open Science and Research : Academics and AI practitioners can access shared datasets and tools to conduct reproducible research and improve model development.
Community Feedback for Model Improvement : Model providers can gather anonymized user feedback and voting data to refine and enhance their AI systems before official releases.

FAQs

LM Arena (Chatbot Arena) Alternatives

Nous Research

A pioneering AI research collective focused on open-source, human-centric language models and decentralized AI infrastructure.

♨️ 1.25K🇺🇸 31.78%

AnythingLLM

All-in-one AI desktop application offering local and cloud LLM usage, document chat, AI agents, and full privacy with zero setup.

♨️ 438.2K🇺🇸 21.65%

Freemium

Allen Institute for AI (AI2)

A nonprofit research institute advancing AI through open-source models, tools, and scientific literature search solutions.

♨️ 17.46K🇺🇸 19.16%

Free

Pathway

A modern UX research platform enabling product teams to rapidly validate designs with real users worldwide through smart, unmoderated tests and AI-driven insights.

♨️ 0 -

Freemium

Pulse Labs

AI-driven platform providing high-quality user feedback, data collection, and model testing to optimize product and AI development.

♨️ 0 -

Paid

Prompt Cowboy

Prompt generation tool that transforms rough ideas into structured, high-performing prompts for ChatGPT, Claude, and other language models.

♨️ 729.14K🇺🇸 20.39%

Paid

Analytics of LM Arena (Chatbot Arena) Website

LM Arena (Chatbot Arena) Traffic & Rankings

15.99M

Monthly Visits

00:09:23

Avg. Visit Duration

Category Rank

0.23%

User Bounce Rate

Traffic Trends: Jun 2025 - Aug 2025

Top Regions of LM Arena (Chatbot Arena)

🇨🇳 CN: 12.52%

🇰🇷 KR: 11.77%

🇮🇳 IN: 8.41%

🇺🇸 US: 8.25%

🇷🇺 RU: 6.96%

Others: 52.09%

LM Arena (Chatbot Arena)

Product Overview

What is LM Arena (Chatbot Arena)?

Key Features

Crowdsourced Pairwise Model Comparison

Elo Rating System for Model Ranking

Open-Source Infrastructure

Live and Continuous Evaluation

Support for Publicly Released Models

Community Engagement and Transparency

Use Cases

FAQs

1. What is LM Arena (Chatbot Arena)?

2. How does the evaluation process work?

3. Which models are included on the platform?

4. Is LM Arena open source?

5. How is model ranking determined?

6. Can anyone participate in the evaluation?

7. How often is the leaderboard updated?

8. What measures are in place to ensure evaluation fairness?