Chess AI Through Language Models: Strategic Reasoning Without Search

AI Lab Lead, Drees & Sommer

Research Overview

This work explores transformer-based strategic reasoning through chess as a testbed, demonstrating that language models can develop sophisticated game-playing capabilities without traditional search algorithms. In collaboration with LAION, we've developed a progression of models that challenge fundamental assumptions about how AI systems learn strategic thinking.

Core hypothesis: Complex strategic reasoning can emerge from next-token prediction when models are trained on appropriately structured strategic data.

The ROOK Project Evolution

RookWorld-RLVR (2025) - RL Fine-Tuning with Verification Current

Active development integrating GRPO (Reinforcement Learning with Verifiable Rewards) for enhanced reasoning capabilities.

RookWorld-LM (2024) - Unified Agent+Environment

124M params: Unified chess policy and world model in a single transformer architecture.
Post: ROOK: REASONING OVER ORGANIZED KNOWLEDGE

Collaboration: Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)

Multi-task Performance:

  • 🏆 32.1% Checkmate-in-One accuracy - outperforms ChessGPT-Base (26.5%) with 24x fewer parameters (124M vs 3B, Feng et al. NeurIPS'23)
  • 99.9% environment simulation accuracy
  • 26.2% overall action accuracy

Significance: Enables closed-loop self-play without external engines

ROOK-LM (2024) - Chain-of-Thought Reasoning

124M params: Implementation of reasoning traces for chess, incorporating position analysis → candidate evaluation → move selection.

  • Dataset: rook_40m (6B tokens, generated on Tsubame 4.0)
  • Architecture: GPT-2 with custom chess tokenization
  • Performance: 22.2% action accuracy with comprehensive reasoning traces
  • Technical Details: LAION Research Note

ROOK-CLF (2024) - Decoder-based Behavioral Cloning

9M params: Reproduction of Google DeepMind's "Grandmaster-Level Chess Without Search" methodology using LLaMA-based decoder.

  • Performance: 49% action accuracy, 57% on Checkmate-in-One
  • Achievement: Demonstrated searchless chess AI feasibility with minimal parameters
  • Model: Available on HuggingFace

LAION Strategic Game Dataset (2023) - Dataset Engineering

Contributed to the LAION Strategic Game Dataset project, responding to their call for participation to enhance AI models' strategic planning capabilities through game-based synthetic datasets. Developed chess-to-text transformation tools for dataset generation as part of this community effort exploring strategic reasoning in language models.

  • Contribution: Chess dataset generation and transformation pipeline
  • Code: chess-to-text repository
  • Project Scale: 3.2 billion chess games, 608 billion moves via Stockfish self-play
  • Impact: Foundation work that evolved into the ROOK project research

YoloChess (2022) - Encoder-based Behavioral Cloning

87M params (Custom DeBERTaV2-base, Vocab Size 500): Initial exploration using BERT-based position evaluation with custom FEN encoders. Established baseline performance and identified key challenges in chess representation for transformer architectures.

Technical Contributions

Novel Architectures

  • Unified world modeling: Simultaneous policy and environment simulation in transformers
  • Strategic tokenization: Custom representations for structured game states
  • Multi-task scaling: Consistent performance improvements with unified training objectives

Dataset Engineering

  • Large-scale annotation: 40M+ positions annotated with Stockfish 16.1 on supercomputing infrastructure
  • Multi-format datasets: Support for classification, autoregressive, and multi-task learning
  • Reproducible pipelines: Full data generation code and methodology documentation

Open Science Impact

All models, datasets, and code publicly available. Contributing to democratization of strategic AI research.

Research Context

Background spans neuro-informatics (University of LĂĽbeck), games industry applications, business economics & management (Witten/Herdecke University, IPADE Mexico DF), and AI/ML consulting. Active contributor to HuggingFace ecosystem (transformers, datasets, evaluate) and open source frameworks including keras-rl and custom implementations like keras-wide-n-deep. Current work at Drees & Sommer, building the AI Lab & exploring applications in construction and real estate optimization.

Research Implications

The RookWorld results suggest that:

  1. Search-free strategic AI is viable with appropriate training data
  2. Unified architectures can efficiently handle multiple strategic reasoning tasks
  3. Chain-of-thought training improves both performance and interpretability
  4. Language model paradigms apply effectively to structured strategic domains

These findings have implications beyond chess for any domain requiring sequential decision-making under complex constraints.