Number of Unique Legal Positions (NULP)
in chess endgames
Updated on 2025-03-05
About
In this document we describe NULP (Number of Unique Legal Positions) as the measure of state-space complexity of chess endgames. We discuss the definitions of uniqueness and legality, and share the NULP values of various endgames.
This paper gets updated when we get new ideas or results. This work is in public domain using the CC0 license. Please email any questions, comments, and ideas to kkryukov@gmail.com. Especially welcome are confirmations or corrections of any results.
Introduction
In 1950 Claude Shannon famously estimated the number of possible chess positions to be "of the general order of 64!/32!(8!)2(2!)6, or roughly 1043" (Shannon (1950) "Programming a Computer for Playing Chess"). This includes some illegal positions (e.g., pawns on the first rank, both kings in check) and excludes legal positions following captures and promotions. Later estimates include 1050 by Victor Allis (with upper bound of 5x1052). More recently John Tromp has shown the upper bound of 2155 (about 1046.7). Significantly improving the estimate for the whole game is difficult with today's methods and computers, however much better estimates are possible for subsets of the game, such as opening of endgame.
On the opening side, one well-known statistic is Perft, which is the number of move paths of N plies from the starting position. Perft is known for up to 15 plies and is widely used for testing and benchmarking move generators. There are 20 move choices in the starting position, so 20 different legal positions can occur after white's first move, and 400 after black's first reply. These numbers agree with Perft(1) and Perft(2), respectively. Perft(3), however, is larger than the number of different legal positions, because it counts transpositions independently from each other. The number of transpositions grows rapidly with each extra ply. This means that while Perft can be used as the upper bound for the number of positions, it does not provide a very good estimate. I'm not aware of any efforts to count different legal positions beyond the first 2 plies.
On the endgame side most of the estimates of endgame sizes come from solving projects. Strongly solving an endgame typically involves storing a value for each indexed position, thus endgame size can be measured as the number of indexed positions. The choice of the indexing method is crucial here and has to balance database compactness with indexing speed: storing fewer positions makes the database more compact, but complicates the indexing, degrading performance. As the result common indexing method use relaxed concepts of position legality and uniqueness.
In summary, currently available estimates of endgame sizes suffer from following limitations:
- Many illegal positions are included.
- Some redundant positions are included.
- Some legal positions are omitted (those with castling rights).
- The results are only available for solved endgames.
- The resutls are usually limited to 8x8 board.
In this experiment I aim to advance on all of these points. This means providing more accurate estimates for endgame sizes in chess and chess variants, while trying to avoid illegal and redundant positions when possible. The motivations for computing such estimates include:
- Knowing the endgame size in advance can be helpful for solving as it allows estimating the required time and hardware.
- NULP-based statistics (e.g., number of wins / NULP) can be used for comparing results of independent solving efforts.
- A well defined NULP allows discussing index efficiency (NULP / number of indexed positions) and compactness (NULP / byte) of different databases.
- Comparing NULP values with other estimates may allow improving estimates for larger endgames or even for the whole game.
At first this data was a by-product of solving experiments, but then I decided to share it separately from solving results.
Definitions
Position
For the purpose of this experiment a position is defined as a combination of piece placement, side to move, castling rights and en-passant rights. Storing the evaluations for all such positions from a particlar endgame allows perfect play from the point of entering that endgame, so the number of such positions is directly related to the amount of necessary storage space.
A complete game state should include the move counter for detecting a draw by 50-move rule. Move counter can be very important when distance to mate (DTM) is stored in the database and used for choosing the moves. However using distance to zeroing (DTZ) allows to avoid storing the move counter while still respecting the 50-move rule. (Most of endgame databases in use today simply ignore the 50-move rule, in which case the move counter is also unnecessary).
A complete game state should also include the relevant part of game history for detecting 3-fold repetitions. Repetitions can't occur in a winning line if the winning side plays perfectly, and they won't matter in a drawing line, therefore the history can be safely omitted as well.
Uniqueness
Counting, solving or storing the same position twice is obviously a waste of time and space. Similarly, if two positions can be proven to produce identical outcomes with best play of both sides (let's call such positions equivalent), it's a waste of space to store both of them independently. Whether it's also a waste of time depends on how fast the equivalence can be tested and on how it impacts indexing design and performance.
In this experiment I require equivalence to perfectly take into account board mirroring, board rotation and color swapping. Such equivalence is easy to test, so counting unique positions (or equivalence classes) is possible. (It's less trivial to design a gapless indexing scheme omitting all redundant positions, but this is outside of the scope of current experiment).
Two positions are called equivalent if one can be obtained from the other through a sequence of transformations, which may include:
| 1. Swapping colors of all pieces, mirroring locations of all pieces and en-passant target squares around the horizontal middle line, swapping castling rights and side to move: | k3/4/4/KR2 w white to move |
→ | kr2/4/4/K3 w black to move |
| 2. (Only for positions with no castling rights, except on a board of odd width) Mirroring locations of all pieces and en-passant rights around the vertical middle line. | k3/4/4/KR2 w white to move |
→ | 3k/4/4/2RK w white to move |
| 3. (Only for pawnless positions with no castling rights) Mirroring locations of all pieces around the horizontal middle line. | k3/4/4/KR2 w white to move |
→ | KR2/4/4/k3 w white to move |
| 4. (Only for pawnless positions with no castling rights, only for square board) Mirroring locations of all pieces around the a8-h1 diagonal (or around the line connecting top-left and bottom-right corners for other board sizes). | k3/4/4/KR2 w white to move |
→ | k2K/3R/4/4 w white to move |
Rotations and mirroring around the other diagonal can be produced by combining several of the listed transformations. Also you can notice that this equivalence is reflexive, symmetric and transitive.
The 4-th transformation can produce identical position if all pieces are located on the a8-h1 diagonal. The 2-nd and 3-rd transformations can produce identical positions on an odd-sized board. The 1-st transformation however can be applied to any position, and will always produce a non-identical position. Therefore every position has at least one equivalent non-identical variant. Each transformation applied twice returns back to the original position, so the maximum possible number of equivalent non-identical variants for a position is 16.
Here is one example of equivalence class — all these positions are considered to be equivalent to each other (4x4 board is used for compactness):