7-men EGTB Bounty - competition format

Post by **Kirill Kryukov** » Wed Aug 27, 2008 8:34 am

Originally I liked bounty format, but now I don't think it is the best choice because it may unjustly reward not the best solution. Therefore I am looking for ideas or suggestions about the format.

Some of the possibilities:

1. Bounty. The first solution that fully complies to the requirement wins. Pros: Simplicity. Cons: The chance that the best solution is rewarded is not high.

2. Fixed term prize. A deadline is announced. After the deadline all submitted solutions are evaluated and the winner is rewarded. Pros: Fair to the developers. Cons: Unflexible. We don't know what time is really needed. If the we set too long time, there will be unnecessary delay. If we set too short time, no solution or insufficiently advanced solutions may be submitted.

3. Repeating fixed term prize. For example, each term is 3 months. Or 6 months. If no compliant solution is submitted, we go to the next term. Essentially same with #2, with the same cons. The set period of time may not be enough to fully test the generator, to experiment with some indexing schemes or with compression methods, etc.. The first generator written in a hurry will win (or one of those written in a hurry).

Generation of Nalimov tablebases took several years. I don't expect generation of 7-men tables to be much faster. Storage requirement will be measured in terabytes. So it is probably a good idea to at least try to make sure that we get the best possible EGTB format and generator.

Assuming that we want to reward the technically best tablebase format and generator, what selection process would you suggest?

koistinen · Post by **koistinen** » Wed Aug 27, 2008 10:46 pm

How about this format:

Decide on an evaluation function for entries and a competition period. Let people send in entries when they like and publish it and an evaluation as soon as possible. When competition period is ended, divide prize in proportion to how far each entry pushed the state of the art.
Say entries A, B and C are entered in that order and have score 5, 3 and 7 respectively.
Assuming baseline score is 0, A would get 5/7 of prize, B gets 0 and C gets 2/7.
Advantage is that when someone has finished writing a generator there is no big need to delay entry and so competition period might be longer than would be good if entries had to be kept secret. Also, it would allow any number of prizes without them getting in the way of each other.

A possible evaluation function:
Specify a reference system with at least 2 cores, 4GB of ram and 1TB of disk. (one disk drive)
Anyone entry should be accompanied with an estimate of how long each endgame would take to generate (as a proportion of total time).
Score would 1001 days divided by time to generate DTZ50 of all 7-man endgames using the evaluated generator.
Result would only have to be stored at some time for each endgame. (so it might be copied then)

A score of 1 then means the 7-man endgames can be generated in on average 24h/endgame on the reference system.

kronsteen · Post by **kronsteen** » Thu Aug 28, 2008 10:03 am

I suggest to use the following scheme for performance testing (this is also a proposed answer to some currently open questions in requirement sheet), upon which prize distribution rules may be elaborated, provided the quality of a solution, and not only its ability to make 7-men available in finite time, is also a wanted feature :

Testing might be performed on the two following endings : krbnpkq and krbpkqn.

Why these endings ?

1) they are the most complex type, and demonstrating capability to build them is demonstrating capability to build a whole set. Hanging on less complex TBs for testing would be running the risk of memory overflow / bugs / unbound generation time when moving to most complex TBs
2) they contain pawns, and they need previous computation of some pawnless endings which will demonstrate capability to generate pawnless TBs as well
3) they are not clearcut, and therefore will probably be among most challenging TBs both in generation time and final compressed size
4) they contain only 1 pawn, and therefore easily accessible as they need the minimum possible number (5) of daughter 7-men ending TBs
5) they are the type of ending which contain the most possible positions (2.1 Terapositions)
6) they are interesting on their own, and can be used as publicity
7) they contain every chess piece type and therefore avoid possible bugs when introducing a new piece type.

Three features are to be considered :

- Generation time (on a “high end machine”, which is to be defined and may well be subject to changes in the future. Common practise is for builders to give generation time they observe on their own hardware + description of this hardware, but they must be trustworthy and because of this it might be not suitable for a competition. Better idea might be to have a testing machine at disposal, whose hardware characteristics are publicly released, might be subject to upgrades, and might also be subject to demands from competitors (buy more RAM, etc…))
- Final size of compressed TBs
- Probing code access time on random positions

A “satisfaction” function is to be built upon these features (and prize distribution rules might be built upon this satisfaction function). This is to be discussed, but about the two first features two separate problems might be considered : Generation time and size of single TBs, and generation time and size of the whole set. About the whole set, it is possible to calculate likely upper bounds as follows :

krbnpkq and krbpkqn are 2.1 Terapos each.
Full set of 43/43p/52/52p is 580 Terapos.
If a “partial set of endings of interest” is to be defined for DTZ/DTZ50, one could propose the 244 “balanced” endings out of 875 for which material count difference between sides is 1 or 3. These endings are 171 Terapos.

Then to evaluate generation time / final size of whole sets, one might simply multiply generation time / final size of the {krbnpkq krbpkqn} subset by 138 for WDL/WDL50 (full set), and by 41 for DTZ/DTZ50 (balanced endings only). This is an upper bound ; for DTZ/DTZ50, it will probably be close to real time/size. For WDL/WDL50, real time/size will be significantly less because unbalanced endings, especially heavily ones, will be significantly lighter in time/pos and size/pos (my guess is an overall factor of 2, so estimate might be given by making 69x {krbnpkq krbpkqn} subset instead of 138x).

Note that all this scheme is compatible with handling separately WDL/WDL50/DTZ/DTZ50 (and therefore with prize splitting between different metrics, should this idea come into consideration).

Here are proposals for “satisfaction functions” upon some different features (naturally to be discussed, especially full set time / size for which satisfaction is inherently dependant of computing/storing resources which will be used).

Time generation
Single TB (of most complex type) : mark is 0/10 for 200 days or more, 10/10 for 1 day or less, and has logarithmic scale in between (formula : m=10(1-ln(x)/ln200)). Time to be considered is slowest among krbnpkq and krbpkqn.
Full set for every single metric (875 TBs for WDL or WDL50, 244 TBs for DTZ or DTZ50), based on formulas above (69x or 41x time for {krbnpkq krbpkqn} subset) : mark is 0/10 for 200 CPU years or more, 10/10 for 1 CPU year or less, logarithmic in between
Full project : 0/10 for 800 CPU years or more, 10/10 for 4 CPU years or less, logarithmic in between

Size
My current estimates, based on observed behaviours on 5-6 men and combinatorial calculations, are something like 10 TB for WDL(50) whole set and 30 TB for DTZ(50) on 244 balanced endings subset. Doing significantly better seems difficult from “information theory” point of view, and doing worse would be degradation from currently standards.
Single TB, WDL(50) format : mark is 0/10 for 200 GB or more, 10/10 for 20 GB or less, logarithmic in between. Size to be considered is biggest among krbnpkq and krbpkqn
Single TB, DTZ(50) format : 0/10 for 1 TB or more, 10/10 for 100 GB or less, logarithmic in between
Full WDL(50) set : 10/10 for 5 TB or less, 0/10 for 20 TB or more, logarithmic in between. Size to be considered is 69x {krbnpkq krbpkqn} subset
Full DTZ(50) set : 10/10 for 15 TB or less, 0/10 for 60 TB or more, logarithmic in between.
Size to be considered is 41x {krbnpkq krbpkqn} subset

Post by **Kirill Kryukov** » Tue Sep 02, 2008 7:03 am

kronsteen, I am not sure why we would want to reward multiple solutions? Do we need more than one 7-men generator and file format? I don't think so. The idea is to stimulate development of a really good generator, to reward the author, and then to use this generator for some years. Computation time will be long, storage requirement will be huge. Add to that the possibility that some people will generate WDL while others will prefer WDL50.

Multiple authors creating different generators and competing for the bounty is fine. But I don't think everyone should be rewarded. It's counter-intuitive to me. Also the bounty may not become big enough to make it attractive when divided.

By the way, if several developers like so, they can of course join their forces, make a single generator, and share the bounty. I would even prefer it this way because by this way the developers can share the work, combine their skills, and avoid re-doing the same basic parts. So the resulting generator will be better. However if someone feels confident to do it single-handedly, it is fine too.

kronsteen · Post by **kronsteen** » Tue Sep 02, 2008 3:27 pm

Kirill Kryukov wrote:Also the bounty may not become big enough to make it attractive when divided.

Prize splitting was not a suggestion, it was just a possible future idea, should the overall project prove too hard or too long for a full solution to emerge. But for now I’m against this idea too, as I agree with your quoted argument above.

My (long) previous post is aimed at building a comprehensive and publicly released testing and scoring scheme, whose goals are 1) to give insurance that submitted solutions will grant 7-men generation feasibility for the community in materially acceptable conditions and 2) to select the most performing method, by quantifying “performance” as a function of desired features (requested or optional), generation speed and TBs final size being the main ones (to me).

CCRL Discussion Board

7-men EGTB Bounty - competition format

7-men EGTB Bounty - competition format

Re: 7-men EGTB Bounty - competition format

Re: 7-men EGTB Bounty - competition format

Re: 7-men EGTB Bounty - competition format

Re: 7-men EGTB Bounty - competition format