7-man EGTB Bounty Reborn - Summary

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Locked
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

7-man EGTB Bounty Reborn - Summary

Post by Kirill Kryukov »

Note: This thread is locked, to not mix important information with discussion. Comments and discussion are very much welcome in the relevant threads, see the "Links" section at the end of this post.

Note 2: This project is in the design stage. If you see something is wrong of missing - don't be surprised. But please post any improvement ideas in the linked threads.

Project Overview

Abstract

It's year 2011, and we still don't have any usable 7-piece chess endgame tables, despite the obvious huge interest in the community. I propose to establish a bounty to stimulate and reward the developers working in this field. Part of the bounty will be offered for the generator and probing code, part for the actual generated tables, and part for support and infrastructure work. This document will contain up-to-date summary of the project.

Introduction

Creating a practical generator for 7-piece endgame tables is an immensely challenging task. Creating even remotely useful 6-piece generator takes months of work for a highly skilled and motivated developer. 7-piece endgames is a lot harder than that. A number of people worked in that direction, but their generators are mostly kept private. The available ones are not open source and have very limited functionality. [TODO: Expand with examples].

Up to now the progress in this field was driven by enthusiastic hobbyists. Every now and then someone will try to write a generator (which may take years), mostly out of their own curiosity. If successful, then some tables are generated and distributed, often with the help from the community. This model worked well for 5-piece endgames, with over a dozen competing formats. It also produced some useful 6-piece tables, most notably the Nalimov format. Somehow this model did not produce a useful 7-piece EGTB generator so far, probably because it's so hard to do. So perhaps it makes sense to change the model and try encouraging the developers with some real funding.

It's apparent that the community has a lot of interest in creation of practical 7-piece endgame tables. How large is that interest - remains to be seen by the progress of this bounty project. Up to now there was no way for the community to directly motivate the developers in organized fashion. Now this bounty project is providing such way.

As a small historical reference, the problem of availability of the 6-piece endgame tables in Nalimov format (about 1.2 TB) was solved by the community with amazing efficiency in 2005-2006 (See the project page). Now the whole set is available from multiple sources online. This time, with 7-piece tables, the problem is not only sharing the existing tables, but also generating them, and creating a useful generator first. Quite a bit bigger task. Is the community strong enough to solve it? With your help, I'm sure it is!

Goals

What this project is going to achieve? By the time when we call the project complete, 7-piece-EGTB-assisted endgame analysis should become an easy and enjoyable experience, available to everyone free of charge. Roughly speaking, this means the following:
  • An efficient and feature-rich generator for 3-to-7-piece endgame tables is available, free and open source.
  • The access code is available and integrated into various chess software.
  • All popular tables are generated and available to download.
  • Obscure / less popular tables can be generated by those who need them (for example 6-vs-1 tables).
  • A unified web-based query interface provides access to all generated tables.
These points will be described in greater detail below.

Also, some talented developers will make some real money. And, of course, computer chess community will have accomplished a mega-project.

Project plan

The rough idea is:

Phase 0. The community drafts the plan and requirements. (This is where we are now).

Phase 1. Advertisement, accumulation of funds, improving the requirements. Ends when the generator is submitted. Part of the bounty that was allocated for the generated is paid to the successful developer (or a team).

Phase 2. Generation and verification of the tables. Creation of the infrastructure for efficient distribution of the tables. Creation of the unified query interface. Key tasks are funded from the remaining part of the bounty.

Phase 3. Project complete, everyone is happy.

Possible outcomes

Phase 1 will most likely result in one of these scenarios:

Scenario 1. Someone submits the solution qualifying to all requirements. Bounty is paid to the developers. We proceed to phase 2 - generating and distributing the tables.

Scenario 2. Someone releases a more or less useful 7-piece EGTB generator, which does not satisfy the bounty requirements. This results in community losing interest to this bounty and pulling back the funds (Yes, all donations can be taken back at any time, until the moment when a solution is submitted). Alternatively, we may decide to lower the requirements, to still reward the developers.

Scenario 3. Nothing happens.

The first scenario is more likely to occur if the community succeeds to assemble an attractive prize for the developers to win. So the outcome of this project will mostly be decided by the amount of interest the community has for 7-piece EGTB.

Links

This bounty project: Previous bounty discussion Discussions in other forums (Please let me know if I am missing any relevant links)
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: 7-man EGTB Bounty Reborn - Summary

Post by Kirill Kryukov »

Phase 1 requirements

Color legend: Requirement, Rationale, Verification, Commentary.

Requirements for the phase 1 solution: tablebase format, generator and the probing code. General idea is: file format and generator code should be reasonably feature rich to not become obsolete in a year or two. We should try to avoid including every feature we can think of, but include only those with significant value to the community. The performance of the generator should be as high as practically possible.

Note: Eventually this document will become very detailed and technical. This is a necessity for a successful specification.

Note: These specifications are just the bare minimum, a successful solution may have any number of additional features or qualities not listed here.

Q: Who will perform the verification of the submitted solution?
A: Verification team, formed by volunteering experts in the field. Members of the verification team should not be affiliated with any team working on a solution.


Q: How fast will a submitted solution be verified?
A: The verification team will try to complete the verification as soon as possible, but some steps may take time. A week or two seems like a reasonable target.


1. Requirements to the tablebase format

Although the tablebase format is specified on its own, its implementation is in the generator and probing code, so it will be verified together with those parts.

1.1. Completeness
  • 1.1.1. Promotion to all pieces should be supported (not only to Q or to Q and N).
  • 1.1.2. En-passant must be fully supported.
  • 1.1.3. Castling must be fully supported.
Rationale: Chessic correctness. Promotions and en-passant were supported by Nalimov and in other formats for long time now. Castilng should be easy to do by additional mini-table keeping just those positions with the castling rights.
A number of test positions will be used to verify these items. TODO: Assemble the test suite.

1.2 Metric. The following metrics must be fully supported:
  • 1.2.1. WDL.
  • 1.2.2. WDL50.
  • 1.2.3. DTZ.
  • 1.2.4. DTZ50.
  • 1.2.5. DTC.
  • 1.2.6. DTM.
Rationale: WDL and WDL50 are fast to compute and distribute, one takes into account the 50-moves rule and the other does not. Apparently both proponents and opponents of the 50-moves rule are strongly represented in the community, so we need to have both metrics as options. DTZ and DTZ50 are the most practical of distance-based metrics. Also they can be built using WDL and WDL50 sub-endgame tables, which allows to build tables for only interesting endgames in DTZ and DTZ50. DTC and DTM are the traditional well-established metrics. Having DTM and DTC implemented will allow us to compare the new format and generator with already existing tables and generators (for 3 to 6 pieces), such as Nalimov's (statistics, table size, and generation speed can be compared).
The generator and probing code will be tested on a number of test positions with expected known values, highlighting the difference between the metrics. TODO: Assemble the metric verification test suite.
Please comment in the Metric Discussion Thread.

General file format: Each tablebase file must contain at least the following information about itself.
  • Game and pieces used. (Chess, 8x8 board, standard pieces).
  • Set of positions contained. (Which endgame is contained in this table, even if it's clear from the file name).
  • Metric used.
  • Indexing scheme. Allows to experiment with alternative indexing schemes.
  • Compression format. Allows to replace the compression format.
  • Partitioning method, volume index, total number of volumes. Allows to experiment with differnet partitioning methods. (For example, storing a 100 GB table as a single file, or splitting into 50 x 2 GB chunks).
In short, the file format has to be future proof. Adding a new metric, indexing, compression method, or even a new chess variant should not break existing code. If you have comments about file format, please share in the File Format Discussion thread.

Requirements to the generator code

Functionality:
  • Generator should supports the tablebase formats described above.
  • Can generate tables for all 3-to-7-piece chess endgames, including exotic 6-vs-1, in any of the supported metrics.
  • Can be operated from the command line.
Performance: This is a tricky part. Ideally we want to reward the team producing the best, which means the fastest generator. However we won't know how fast it can be in advance. So, the way I see it, we need to ask the prospective developers to come up with the estimates of their generator performance. Then the highest speed specs will be put into the requirements, allowing the best generator to win.

Source: Has to be open source. The source should be significantly clean, readable, maintainable C/C++. With lots of comments. The code should not be cryptic (see Nalimov's code for example of cryptic). All functions, globals, classes and interfaces should be documented in good detail. Key structures and algorithms should be explained in documentation.

Portability: Code has to compile cleanly on popular platforms. No dependences on platform specific calls or non-free libraries.

License: Must be one of the OSI-approved licenses. Modification and redistribution should be allowed.

Requirements to the probing code

Functionality: Full support of the tablebase formats above. Can work with partitioned tables, etc..

Performance: At the very least, instant or close to instant initialization. For access speed requirements - suggestions are welcome.

API: To do.

Source: Clean beautiful C code, of textbook-grade structure and clarity. Comments everywhere.

Portability: Standard C, not C++. Cleanly compiles anywhere, or almost anywhere. At the very least, compiles with GCC and Visual C with no warnings.

License: Any OSI-approved license, which allows free re-use by closed source and commercial code. Rationale: We want the new EGTB format supported by everyone, not only by free engines. Commercial interfaces must be able to incorporate the probing code. Also, the probing code must not depend on any libraries that can't be reused by closed source commercial projects (relevant for compression libraries).

Requirements to the documentation

Tablebase format has to be described in detail in a text file. This documentation should be detailed enough to allow creating an independent access code. All headers, meta-data, indexes, etc, should be explained in good detail.
Locked