How well can LLMs solve chess puzzles?

Open link in next tab

GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.

https://github.com/kagisearch/llm-chess-puzzles

Benchmark LLM reasoning capability by solving chess puzzles. - kagisearch/llm-chess-puzzles

GitHub - kagisearch/llm-chess-puzzles: Benchmark LLM reasoning capability by solving chess puzzles.

Each LLM is given the same 1000 chess puzzles to solve. See puzzles.csv. Benchmarked on Mar 25, 2024.

ModelSolvedSolved %Illegal MovesIllegal Moves %Adjusted Elo
gpt-4-turbo-preview22922.9%16316.3%1144
gpt-419519.5%18318.3%1047
claude-3-opus-20240229727.2%46446.4%521
claude-3-haiku-20240307383.8%59059.0%363
claude-3-sonnet-20240229232.3%66366.3%286
gpt-3.5-turbo232.3%68368.3%269
claude-instant-1.2101.0%70766.3%245
mistral-large-latest40.4%81381.3%149
mixtral-8x7b90.9%83283.2%136
gemini-1.5-pro-latest*FAIL----

Published by the CEO of Kagi!