Jonathan Rahn - AI Research & Engineering

Research Overview

This work explores transformer-based strategic reasoning through chess as a testbed, demonstrating that language models can develop sophisticated game-playing capabilities without traditional search algorithms. In collaboration with LAION, we've developed a progression of models that challenge fundamental assumptions about how AI systems learn strategic thinking.

Core hypothesis: Complex strategic reasoning can emerge from next-token prediction when models are trained on appropriately structured strategic data.

The ROOK Project Evolution

RookWorld-RLVR (2025) - RL Fine-Tuning with Verification Current

Active development integrating GRPO (Reinforcement Learning with Verifiable Rewards) for enhanced reasoning capabilities.

Repo Transformers & TRL Repo PyTorch

RookWorld-LM (2024) - Unified Agent+Environment

124M params: Unified chess policy and world model in a single transformer architecture.
Post: ROOK: REASONING OVER ORGANIZED KNOWLEDGE

Collaboration: Jenia Jitsev (LAION/JSC), Qi Sun (Tokyo Tech/Sakana AI)

Multi-task Performance:

🏆 32.1% Checkmate-in-One accuracy - outperforms ChessGPT-Base (26.5%) with 24x fewer parameters (124M vs 3B, Feng et al. NeurIPS'23)
99.9% environment simulation accuracy
26.2% overall action accuracy

Model Dataset

Try Interactive Demo

Browser-based inference using ONNX Runtime (WASM). Downloads 285MB model for client-side generation.

Significance: Enables closed-loop self-play without external engines

ROOK-LM (2024) - Chain-of-Thought Reasoning

124M params: Implementation of reasoning traces for chess, incorporating position analysis → candidate evaluation → move selection.

Dataset: rook_40m (6B tokens, generated on Tsubame 4.0)
Architecture: GPT-2 with custom chess tokenization
Performance: 22.2% action accuracy, 24.4% Checkmate-in-One with reasoning traces
Technical Details: LAION Research Note

Model

Try Interactive Demo

Browser-based inference using ONNX Runtime (WASM). Downloads 285MB model for client-side generation.

ROOK-CLF (2024) - Decoder-based Behavioral Cloning

9M params: Reproduction of Google DeepMind's "Grandmaster-Level Chess Without Search" methodology using LLaMA-based decoder.

Performance: 49% action accuracy, 57% on Checkmate-in-One
Achievement: Demonstrated searchless chess AI feasibility with minimal parameters
Model: Available on HuggingFace

Try Interactive Demo

Browser-based inference using ONNX Runtime (WASM/WebGPU). Downloads 9MB quantized model for client-side generation.

LAION Strategic Game Dataset (2023) - Dataset Engineering

Contributed to the LAION Strategic Game Dataset project, responding to their call for participation to enhance AI models' strategic planning capabilities through game-based synthetic datasets. Developed chess-to-text transformation tools for dataset generation as part of this community effort exploring strategic reasoning in language models.

Contribution: Chess dataset generation and transformation pipeline
Code: chess-to-text repository
Project Scale: 3.2 billion chess games, 608 billion moves via Stockfish self-play
Impact: Foundation work that evolved into the ROOK project research

YoloChess (2022) - Encoder-based Behavioral Cloning

87M params (Custom DeBERTaV2-base, Vocab Size 500): Two-stage training approach: initial masked language modeling (MLM) pretraining on FEN representations, followed by supervised fine-tuning on a sequence classification objective for move prediction. Established baseline performance and identified key challenges in chess representation for transformer architectures.

Dataset: yolochess_lichess-elite_2211
Architecture: DeBERTa v2 with custom FEN tokenization and classification head
Training: MLM pretraining → Supervised fine-tuning for sequence classification
W&B Logs: View W&B Training Logs

Technical Contributions

Novel Architectures

Unified world modeling: Simultaneous policy and environment simulation in transformers
Strategic tokenization: Custom representations for structured game states
Multi-task scaling: Consistent performance improvements with unified training objectives

Dataset Engineering

Large-scale annotation: 40M+ positions annotated with Stockfish 16.1 on supercomputing infrastructure
Multi-format datasets: Support for classification, autoregressive, and multi-task learning
Reproducible pipelines: Full data generation code and methodology documentation

Open Science Impact

All models, datasets, and code publicly available. Contributing to democratization of strategic AI research.

Research Context

Background spans early esports content management with leading German clan mTw during competitive gaming's formative years, founding and scaling startup readmore.de (CEO, 2005) which earned two esports awards before acquisition by publishing house Computec Media AG in 2007. Academic foundation in neuro-informatics (University of Lübeck) and business economics & management (Witten/Herdecke University, IPADE Mexico DF, Masters 2012), followed by games publishing startup experience and transition into data-driven digital performance marketing. Continuous learning includes fast.ai deep learning (2018), INRIA scikit-learn MOOC (2021), and Mastering LLMs with Hamel Husain (Maven, 2024). Recognition includes the SEOday 2023 best speaker award for GPT-4 content generation innovation. Contributor to HuggingFace ecosystem (transformers, datasets, evaluate) and open source frameworks. Current work at Drees & Sommer, building the AI Lab & exploring applications in construction and real estate optimization.

Research Implications

The RookWorld results suggest that:

Search-free strategic AI is viable with appropriate training data
Unified architectures can efficiently handle multiple strategic reasoning tasks
Chain-of-thought training improves both performance and interpretability
Language model paradigms apply effectively to structured strategic domains

These findings have implications beyond chess for any domain requiring sequential decision-making under complex constraints.