NULP

(Number of Unique Legal Positions in chess endgames)

Intorduction

In 1950 Claude Shannon famously estimated the number of possible chess positions to be "of the general order of 64!/32!(8!)²(2!)⁶, or roughly 10⁴³". This includes some illegal positions (e.g., pawns on the first rank, both kings in check) and excludes legal positions following captures and promotions. Later estimates include 10⁵⁰ by Victor Allis (with upper bound of 5x10⁵²). More recently John Tromp has shown the upper bound of 2¹⁵⁵ (about 10^46.7). Significantly improving the estimate for the whole game is difficult with today's methods and computers, however much better estimates are possible for subsets of the game, such as opening of endgame.

On the opening side, one well-known statistic is Perft, which is the number of move paths of N plies from the starting position. Perft is known for up to 13 plies and is widely used for testing and benchmarking move generators. There are 20 move choices in the starting position, so 20 different legal positions can occur after white's first move, and 400 after black's first reply. These numbers agree with Perft(1) and Perft(2), respectively. Perft(3), however, is larger than the number of different legal positions, because it counts transpositions independently from each other. The number of transpositions grows rapidly with each extra ply. This means that while Perft can be used as the upper bound for the number of positions, it does not provide a very good estimate. I'm not aware of any efforts to count different legal positions beyond the first 2 plies.

On the endgame side most of the estimates of endgame sizes come from solving projects. Strongly solving an endgame typically involves storing a value for each indexed position, thus endgame size can be measured as the number of indexed positions. The choice of the indexing method is crucial here and has to balance database compactness with indexing speed: storing fewer positions makes the database more compact, but complicates the indexing, degrading performance. As the result common indexing method use relaxed concepts of position legality and uniqueness.

In summary, currently available estimates of endgame sizes suffer from following limitations:

Many illegal positions are included.
Some redundant positions are included.
Some legal positions are omitted (those with castling rights).
The results are only available for solved endgames.
The resutls are usually limited to 8x8 board.

In this experiment I aim to advance on all of these points. This means providing more accurate estimates for endgame sizes in chess and chess variants, while trying to avoid illegal and redundant positions when possible. The motivations for computing such estimates include:

Knowing the endgame size in advance can be helpful for solving as it allows estimating the required time and hardware.
NULP-based statistics (e.g., number of wins / NULP) can be used for comparing results of independent solving efforts.
A well defined NULP allows discussing index efficiency (NULP / number of indexed positions) and compactness (NULP / byte) of different databases.
Comparing NULP values with other estimates may allow improving estimates for larger endgames or even for the whole game.

At first this data was a by-product of solving experiments, but then I decided to share it separately from solving results.


© 2013–2014 Kirill Kryukov Available under the CC BY 3.0 License