7-men EGTB Bounty

Post by **Kirill Kryukov** » Wed Aug 06, 2008 7:12 am

Something is becoming more clear gradually, thanks to all for the discussion. I feel that DTR or DTZR are too complex for this project. If we are hoping to raise any funds we have to keep our goals simple and practical. It's complicated to generate and use DTR and DTZR and they are not directly related to the game of chess. My proposal is therefore to concentrate on two metrics (instead of just one): WDL and DTZ50.

As it was said in this thread, chess without 50-move rule is a chess variant. Chess cariants are fun and they may have theoretical value, but IMHO the game of chess should have prioprity.

It's likely that every one of us has more interest in some particular aspects of the chess game. So we prefer the tablebases that would suit our needs, like "exploring" some properties of the game with varying N-move rule, etc... This is a nice academic exercise, but it is not suitable for the community-based 7-men project. 7-men tables are huge, it will need a great deal of CPU time and communication to generate even half. We have to put aside our personal academic interests and try to focus on something really important and simple first. Note that DTR / RTZR are not studied well even with 5-men endings, so it's a kind of new concept. There is no engine that uses such tables. A totally untested concept is not best way to proceed at this stage, even if it is fun.

Look at Nalimov's DTM. DTM is simple, so it's used in every strong engine. It looks trivial, but so many engine have tablebase-related bugs. Did you notice that? Now take a guess at how many efficient bug-free engines using DTR / DTZR you will see?

If you take random chess player and try to explain him why he needs a DTR / DTZR data, you will see the problem. If you try to generate it in the scale of tens of terabytes (or to find volunteers to do so), you will again see the problem.

Vincent brought some good points here about the WDL. What I did not realize at first is that WDL and DTZ50 are totally interchangeable during the generation. If you have the comlpete set of WDL tables, you can generate ANY DTZ50 table you wish. It makes perfect sense to use WDL tables necessary for generation, even when generating a DTZ50 table. This way they will fit in RAM which will help a lot.

DTZ versus DTZ50 is one of the questions. I would suggest to go for DTZ50 for the space advantage, chessic correctness and importance in practical play. DTZ is more theoretical, same like DTM, or DTC.

I will summarize the metric requirement part, but please post if you have anything to add.

Post by **Kirill Kryukov** » Wed Aug 06, 2008 7:28 am

A dual-metric approach will allow us to genearate and keep unimportant pawnless tables in WDL, and only use DTZ50 for interesting endgames, with many pawns.

Generating a WDL from a DTZ50 will be a matter of trivial conversion. Generating DTZ50 from a WDL will require additional WDL tables (where positions in the current ending can convert to) and longer time. Important point is that to generate something like KPPPKPP in DTZ50 you don't need all lesser tables in DTZ50. WDL is fully sufficient.

One good point of this approach is that these two metrics will satisfy two important uses. WDL is best suited for search, where DTZ50 will be too slow to access. However DTZ50 can be used at limited depth to find the win. Also DTZ50 can be used via web-interface.

guyhaw · Post by **guyhaw** » Wed Aug 06, 2008 9:08 am

I agree with KK that WDL EGTs should be produced, but _not_ WDL50 EGTs. Note that "Generating a WDL from a DTZ50 EGT" produces a WDL50 EGT.
I see the attraction for most of the DTZ50 EGTs, but note that there is a danger of confusing DTZ and DZ50 EGTs: John Tamplin did it once, an easy file-naming fumble, but fortunately our checks picked it up.
Note also that one cannot proceed easily from a set of DTZ EGTs to a set of DTZ50 EGTs or vice-versa. Only if you know that maxDTZ =< 50 for all possible succeeding endgames can you just 'chop the head off' a DTZ EGT. [JT and I did that on occasion but it all requires careful (and fallible) thought]. You do have to 'top' the DTZ EGT but there is much more to it than that.
For example, maxDTZ for KBBKNN = 38, but maxDTZ50 = 29 and a high % of the DTZ-wins become draws under the 50-move rule.
I like the idea of keeping 'unimportant' endgames' EGTs in WDL format, except you would need to generate the DTZ50 EGT for all successor endgames in order to generate the DTZ50 EGT for a 'target' endgame.
In summary, it is easy to turn WDL into a DTZ EGT and vice-versa but this does not apply re DTZ50 EGTs.
Also, I think everyone would really want to know how deep an endgame-phase is. So my personal vote would be for DTZ not DTZ50.

I agree that a set of (DTR, DTZR) EGT - and I'm not suggesting this for 7-man yet - is a 'stretch target' and not ideal for a first EGT Community project. I agree with KK that the risks of error are too high as there hasn't been a proven pilot project yet.

Just to clarify some points raised by sz...:

A 'phase of play' is a sequence of moves between moments when the move-count is zero (so it includes the P-push or capture at the end)
The 'move-count' is the penultimate integer in the FEN, the count of moves in the current phase.
The 'number of moves to the end of the phase' rather depends on what strategy the players are using: take 6KN/8/8/8/B7/8/Q7/1k6 b - - 0 194
- note that the 'move count' (for the phase) is 0 whereas the number of the move that Black is about to make is 194
- DTC-, DTZ- and DTZ50-optimal is 194...Kc1 However (DTR, DTZR) = (31, 0) in winner's moves (61, 1) in plies, as 194...Kxb2 leaves a 31-move mate.
-the difference is that in DTC/Z(50) metrics, Black does not make optional captures, whereas - 'taking the strategic view' in the DTR-metric, it does.

I think sz's idea about k1, k2 etc is maybe ok, but all this information is telescoped into the (DTR, DTZR) EGT in one go. As others have said, the DTR EGT tells you how much beyond a theoretical k-move rule your win lies - how much you have to finesse the line to make a win achievable.

No time to write more, for which many of you will be grateful

.

g

guyhaw · Post by **guyhaw** » Wed Aug 06, 2008 9:11 am

In the past, FIDE extended the 50-move rule to bigger numbers when the first information about deep endgames turned up.
Then they realised this was a never-ending process (rightly) and brought the 50-move rule in across the board, so to speak.
It does not apply in the Chess Composition domain - see the FIDE PCCC Codex of Composition.
The 50-move rule originated because 'professionals' made money from chess games played in coffee houses. They wanted to limit 'cost'.
Given FIDE's tendency to shorten timescales to make chess more attractive, there is a non-zero chance that 50 will become (e.g.) 40.
Then we will wonder why we generated DTZ50 EGTs rather than the DTZ EGTs.
g

syzygy · Post by **syzygy** » Wed Aug 06, 2008 10:13 am

kronsteen wrote:What kind of info is most relevant for these positions ? DTR info (i.e smallest k which would be required for position to be a win on a k-move rule, with k>50 necessarily), or DTZ info (i.e min number of moves, necessarily >50, required from current position in order to reach a zeroing move which leads to a won position under 50-move rule) ? This is not exactly the same. I tend to prefer last one, as it is probably easier to compute, and is also usable in practical cases when some non-zeroing moves have been previously played before reaching position (which is not the case of DTR).

I fully agree with your analysis. And I realize this is your earlier suggestion about DTZ50 with the algorithm running beyond z=50.

When you suggested that table, I had some trouble seeing how it would work. It is (theoretically) easy to do 50-iterations on a 7-piece table based on DTZ50 6-piece tables to generate a 7-piece DTZ50 table and then go on to find the "won" positions that are drawn under the 50-move rule. However, you would still miss the 7-piece positions that can be forced (in 10, 40, 60 moves) to a "winning" 6-piece position that is drawn under the 50-move rule. You could try to pick up those moves too, but how to distinguish between a 7-piece position that converts in 10 moves into a real winning position and a 7-piece position that converts in 10 moves into a "cursed" 6-piece position?

The solution is not that hard: label all positions with a "winnable in 50"-bit. Or equivalently, use "my" DTR( 50 < inf ) metric. The "value space" will not be linear anymore as in normal DTZ/DTC/DTM, but it will consist of two levels, i.e. {0,1} x N (whereas the value space of DTR is NxN).

Btw, my notation DTR(k) (for those that are still with me

) should be changed into DTZ(k). Actually, it also possible to talk about DTC(k) and DTM(k).

DTZ(k): pairs (min{i : p is won under k_i-move rule}, distance to zeroing move)
DTC(k): pairs (min{i : p is won under k_i-move rule}, distance to conversion)
DTM(k): pairs (min{i : p is won under k_i-move rule}, distance to mate)
(Here, "distance to mate" means distance to mate when minimaxing first on i, then on distance to mate.)

Now DTR = DTZ(1<2<3<...).
Kronsteen's table = DTZ(50<inf).
DTM(50) will give all shortest legal mates.

Variations are still possible. Instead of "distance to Z/C/M", one could take "distance to either Z/C/M or a smaller i". This will give smaller tables, as the values will be smaller. However, that metric for DTZ(50 < inf) may not be the most ideal, as once you are in a cursed position it seems better to optimize for a zeroing move that resets the 50-move counter. This still needs some thought though.

DTM(50 < inf) would give the shortest legal mates plus all shortest mates that can be won when ignoring the 50-move rule, and are played out under the condition that the winning side tries to reach a "safe win" as fast as possible, and the losing side tries to keep a position that draws under the 50-move rule for as long as possible. Potentially interesting, but DTZ(50 < inf) will be much smaller.

Another idea: WDL(k). WDL(50 < inf) gives information for each position whether it is a win, loss, drawn but won without 50-move rule, drawn but lost, or really really drawn. Size should be very close to the size of WDL (with proper compression). WDL(k) can be used to generate DTZ(k).

guyhaw wrote:Also, I think everyone would really want to know how deep an endgame-phase is. So my personal vote would be for DTZ not DTZ50.

What about DTZ(50 < inf) combining all the joys of DTZ and DTZ50?

Of course, a precondition is that the yet-to-be-developed generator will be able to produce these tables. And for 7-piece it might be a stretch, but at the moment 7-piece is a stretch anyway. I think it would first be very interesting to have 5- and 6-piece tables combining DTZ with DTZ50, have them much smaller than Nalimov's and more easily accessible. As a bonus, they allow things that have so far been unthinkable, at least for the public. (And tbh, I think a full 7-piece set can only be done by Google at the moment, whatever the format.)

kronsteen · Post by **kronsteen** » Wed Aug 06, 2008 12:36 pm

DTR/DTZR :

To me, DTR/DTZR metrics are best aimed at two goals :

- building playing programs which perform well against humans in endgames
- providing new knowledge for theorists which want to process it in order to extract useful practical information for players, and write books for example.

For second goal, I see a very interesting target : kqpkq. This is the last 5-men endgame not to be fully explained by books such as John Nunn's ones. John Nunn himself found it too hard to understand with access to DTM TB only. DTR/DTZR will certainly throw new light on it. Should it prove to be the way to finally make this ending understandable to humans, this would be very probant to show this metric's utility.

Of course, this has nothing to do with 7-men now.

7-men :

Kirill Kryukov wrote:A dual-metric approach will allow us to genearate and keep unimportant pawnless tables in WDL, and only use DTZ50 for interesting endgames, with many pawns.

I fully agree. This may well be what we just needed to build a « full and relevant » 7-men set of affordable size and containing most valuable info, i.e winning lines in interesting practical endings.

WDL tables give three possible position status. This is to be stored on 2 bits. I'm not sure of this, but a fourth position status can be used for free as it doesn't change the number of bits and therefore won't change final size of compressed data. As an identified advocate for cursed positions

(but in fact, I've no better idea for now) I suggest the « WDLC » format which use fourth space to identify cursed positions (i.e theoretically winnable positions which are drawn under 50-move rule). Cursed position status flag is incomplete as it doesn't identify which is the « attacking side », but in most cases humans will be able to figure it out themselves.

50-move rule

Before moving on to DTZ50 (if this is the option finally selected) we should have some clear view of what may this rule become in long term future. I think that k-move rules with k>50 aren't to be expected back, as we can remember protests of competitors who had to suffer a 100-move battle on krbkr with inferior side just because some alien intelligence told them there were some winnable positions beyond 50 moves. However, I don't know if k-move rules with k<50 are realistic future options, especially for quick games. We might check this first.

Post by **Kirill Kryukov** » Wed Aug 06, 2008 1:38 pm

kronsteen wrote:WDL tables give three possible position status. This is to be stored on 2 bits. I'm not sure of this, but a fourth position status can be used for free as it doesn't change the number of bits and therefore won't change final size of compressed data.

But why, if you will use 4 possible values instead of 3, it should certainly increase the final size of compressed data?

Regarding the 50-move rule. The unique position of this rule in chess history is that the emergence of computer chess happened when the 50-move rule was in use. Now we have hundreds of chess programs (engines, interfaces, database software, etc) that all have 50-move rule hard-coded. There is no option for disabling 50-move rule in most of the programs. These programs are not going to be rewritten overnight if someone decides to cancel the 50-move rule. And I don't want to consider throwing away all this legacy under theoretical possibility of FIDE adopting a 40-move rule or anything else. So I think that the 50-move rule is by now written in stone for computer chess.

I can understand that there is much interest in theoretical aspects of EGTB, especially among the theorists gathering here.

But please consider that we need something very practical and simple for wide adoption. And we need wide adoption to compute the important 7-men tables sooner.

I am not against any research using any metric or combination of metrics. But do you so badly need to do it with 7-men tables? After the 5-men endings are studied in the many ways proposed in this thread, there may be further discussion, but a complete 7-men EGTB is so massive task that we need any help we can get to finish it at all. Any help includes smallest possible files, most compact metric, and widest mind share. Which is all achieved by WDL50 and DTZ50 combination.

This is just my opinion obviously. If many of you are think that WDL+DTZ is better than WDL50+DTZ50 then we will have to go with that or require support of WDL(N)+DTZ(N). But my warning is that if we start working on mutiple formats simultaneously, it will slow down the project considerably.

syzygy · Post by **syzygy** » Wed Aug 06, 2008 3:03 pm

kronsteen wrote:WDL tables give three possible position status. This is to be stored on 2 bits. I'm not sure of this, but a fourth position status can be used for free as it doesn't change the number of bits and therefore won't change final size of compressed data.

A fourth position state will affect final size, unless one uses the very naive compression of packing 4 positions in a byte. Such compression would give a compression ratio of 25% for a WDL-table like KQQKR (starting from 1 byte/pos), which is extremely bad. (Slightly better is packing 5 positions in 1 byte, leading to 20%, but that is still extremely bad for a low entropy tablebase like a WDL table for KQQKR.) For the same reason, adding a fifth state will affect final size as well, but with proper compression it will far from add 50% (which would be the case when going from 2 to 3 bits per position).

I could say something more about my own compression scheme, but it doesn't fit this thread. But my point is: don't think in terms of bits or number of states, but in terms of "information".

In any case, I'm not at all against adding this type of information. As far as I'm aware, the positions affected by the 50-move rule form only a very small portion of the total number of positions. I would be surprised if it adds more than 1% to total table size. I don't think that is a bad trade-off. Experiments are needed though.

As an identified advocate for cursed positions (but in fact, I've no better idea for now) I suggest the « WDLC » format which use fourth space to identify cursed positions (i.e theoretically winnable positions which are drawn under 50-move rule). Cursed position status flag is incomplete as it doesn't identify which is the « attacking side », but in most cases humans will be able to figure it out themselves.

I much prefer what in my notation would be WDL(50 < inf) which is almost what you propose, but distinguishes cursed wins from blessed losses. Five states, will give a sligthly larger table, but is more complete and more useful.

In a tablebase like KNNvKP where all cursed positions are won for white, the 5th state won't be used at all so you effectively get the same as what you propose (with no loss of compression efficiency). In 6- and 7-piece tablebases there will surely be cursed positions where you won't easily be able to say which side has the advantage. (Just as you often won't be able to say which side is winning if all you know is that it is not a draw. This is why we have WDL and not D/ND, even though D/ND only needs 1 bit in a naive compression scheme.)

syzygy · Post by **syzygy** » Wed Aug 06, 2008 4:13 pm

Kirill Kryukov wrote:I can understand that there is much interest in theoretical aspects of EGTB, especially among the theorists gathering here. But please consider that we need something very practical and simple for wide adoption. And we need wide adoption to compute the important 7-men tables sooner.

If it turns out that DTZ50 extended with DTZ (in a single table) takes hardly more time (for generation) and space (for storage), I don't see any reason not to go for that. Engines that don't know how to use it, simply don't use it and can stick to DTZ50. People who are not interested in the subtleties don't have to understand them (nobody will ask them to understand the generation and compression algorithms either!). Engines that know how to use it will normally be able to win "drawn" positions against any human and against any engine that doesn't use the tables. People that are interested in chess theory will surely welcome 7-piece tables. Finally, it will avoid two camps, one generating DTZ50 and one generating DTZ.

DTZ50 extended with DTR, or full DTR, seem to be less useful for practical gameplay. However, it would be very nice if the generator can handle those as well, so that interested individuals can generate the 5- and 6-piece DTR for themselves.

What will be possible depends foremost on the design of the generator. Koistinen and h.g.muller seem to have very interesting ideas on that. I haven't dug deep enough into those so far. My own generator just does everything in RAM. It's pretty fast, but it can't be extended for 6-piece except by waiting for PCs with more RAM to become available.

Arpad Rusz · Post by **Arpad Rusz** » Wed Aug 06, 2008 4:42 pm

In chess studies, the 50 move rule is not used. This is why the proposed DTZ50 is not useful for me, or to the other chess composers.
In my view, the first priority is the generation of the WDL tablebases for the obvious practical reasons. This is doable in relatively short time - if somebody writes the generator, of course.

They need "only" a few TB of storage space. Then these tablebases can be used in the process of generation of the full metric tablebases.

kronsteen · Post by **kronsteen** » Wed Aug 06, 2008 4:55 pm

syzygy wrote:A fourth position state will affect final size, unless one uses the very naive compression of packing 4 positions in a byte. Such compression would give a compression ratio of 25% for a WDL-table like KQQKR (starting from 1 byte/pos), which is extremely bad. (Slightly better is packing 5 positions in 1 byte, leading to 20%, but that is still extremely bad for a low entropy tablebase like a WDL table for KQQKR.) For the same reason, adding a fifth state will affect final size as well, but with proper compression it will far from add 50% (which would be the case when going from 2 to 3 bits per position).

I could say something more about my own compression scheme, but it doesn't fit this thread. But my point is: don't think in terms of bits or number of states, but in terms of "information".

In any case, I'm not at all against adding this type of information. As far as I'm aware, the positions affected by the 50-move rule form only a very small portion of the total number of positions. I would be surprised if it adds more than 1% to total table size. I don't think that is a bad trade-off. Experiments are needed though.

Yes, I fully understand, stating 4th space could be used for free was nonsense, I now realize. But what is true is the fact that cursed wins / blessed losses (nice term, thanks syz

) are very rare and therefore can be added on WDL50 – DTZ50 tables for almost no expense. That's what I suggest now.

Kirill Kryukov wrote:This is just my opinion obviously. If many of you are think that WDL+DTZ is better than WDL50+DTZ50 then we will have to go with that or require support of WDL(N)+DTZ(N).

My own vote is also for WDL50 / DTZ50 now. If 50-move rule is really to be considered stone carved forever, so much the better.

Kirill Kryukov wrote:But my warning is that if we start working on mutiple formats simultaneously, it will slow down the project considerably.

According to this, I think now that what is really needed is good advocacy in order to muster EGTB community's forces (and maybe outside help) towards this one project : 7-men on DTZ50+WDL50, and no other variant. A project of a requirement sheet is to be needed to. If I can find time, I'll try to work this out and submit something soon.

kronsteen · Post by **kronsteen** » Wed Aug 06, 2008 5:52 pm

Arpad Rusz wrote:In chess studies, the 50 move rule is not used. This is why the proposed DTZ50 is not useful for me, or to the other chess composers.

DTZ50+WDL50 tables will fully meet chess composers' needs, provided tables hold info about cursed positions, as I suggest. A cursed win is a win for problemers and a draw for chess players. Another good point to identify them.

Every miniature problem in the world will be checkable for consistency as soon as WDL50 7-men are released. Composer's line of play will be able to be verified if DTZ50 table is also available on material balance of the problem (provided, if initial position is a cursed win, that a metric is also sustained for cursed wins).

As I realize now, identifying cursed wins on 7-men won't need only running recursive algorithm beyond z>50. It will need a second full run (first one for true wins can be stopped at z=50) as cursed 7-men positions can be connected to cursed 6-men, but true wins on 7-men must lead to true wins on 6-men (or mate).

syzygy · Post by **syzygy** » Wed Aug 06, 2008 6:58 pm

kronsteen wrote:As I realize now, identifying cursed wins on 7-men won't need only running recursive algorithm beyond z>50. It will need a second full run (first one for true wins can be stopped at z=50) as cursed 7-men positions can be connected to cursed 6-men, but true wins on 7-men must lead to true wins on 6-men (or mate).

Yes, that's why you need the DTZ(50 < inf) metric, let's call it DTZ50+. Each position is assigned a flag (yes/no) and a value:
( winnable under 50-move rule? , distance to either a zeroing move or a 50-move rule win )

The actual encoding can be as a singe value. Since positions winnable under the 50-move rule will have a distance to zero of at most 50, we can map like this:
(yes, N) -> N
(no, N) -> 50+N

Generation will be a bit harder but not that much. For normal DTZ, the first iteration loops through all captures and probes the values of the resulting positions for being a win, draw or loss. The other iterations work back from this information, until no new positions are resolved anymore (DTZ), or until 50 iterations are done (DTZ50).

To generate DTZ50+, after 50 iterations you loop again through all captures and find those that convert to cursed wins or blessed losses. This information goes into the table, and you continue doing iterations until the table is finished.

The second round of probing only needs to probe those tables that are known to contain cursed positions (most tables will not, is my guess). It might also be possible to gather all necessary information in the first round.

A lot will depend on the actual algorithm used for the generation, but my feeling is that any algorithm than can do DTZ can be adapted to do DTZ50+. WDL50+ can be obtained trivially from DTZ50+ and is sufficient for generation of higher tables.

DTR could be generated in the same way, but it will require more rounds. I think the metric (DTR, distance) would be mapped like this:
(1, 1) -> 1
(2, 1) -> 2
(2, 2) -> 3
(3, 1) -> 4
(3, 2) -> 5
(3, 3) -> 6
And in general (k, n) -> k(k-1)/2 + n
For an efficient DTR algorithm some thought is needed on what can be done in parallel. My guess is that 5-piece can be done quite easily, as you can store a lot of information per position in RAM.

syzygy · Post by **syzygy** » Wed Aug 06, 2008 7:15 pm

Interpreting probe values of such a DTZ50+ table is quite simple. Let v be the value for a position.

If v = 0, the position is a genuine draw.
If 0 < v <= 50, the position is won under the 50-move rule.
if v > 50, the position is a draw, but won if you ignore the 50-move rule.
if - 50 <= v < 0, the position is a loss under the 50-move rule.
If v < -50, the position is a draw, but lost if you ignore the 50-move rule.

For "normal" values -50 <= v <= 50, the winning line can be found as usual.

If v > 50, the winning line is also found by looking for moves that lower v. There are two possibilities: either you slip into a position with v <= 50, or before you reach v=50 you make a zeroing move and end up in a position with again v > 50.

If you slip into a position with v <= 50, the game cannot actually be won under the 50-move rule (with optimal play by the 'losing' side), as there won't be enough moves left.

jkominek · Post by **jkominek** » Wed Aug 06, 2008 7:49 pm

kronsteen wrote: WDL tables give three possible position status. This is to be stored on 2 bits. I'm not sure of this, but a fourth position status can be used for free as it doesn't change the number of bits and therefore won't change final size of compressed data. As an identified advocate for cursed positions (but in fact, I've no better idea for now) I suggest the « WDLC » format which use fourth space to identify cursed positions (i.e theoretically winnable positions which are drawn under 50-move rule). Cursed position status flag is incomplete as it doesn't identify which is the « attacking side », but in most cases humans will be able to figure it out themselves.

In a similar vein, you may want to read this older thread. I introduced the term WLDR (win loss draw drawn-by-rule) to describe 4-valued tables of this variety. viewtopic.php?f=6&t=3265#p31677
You're right that a value of R doesn't say whether it was W or L beforehand. Thanks for pointing that out.

Kirill is correct though: placing extra information into the 2nd bit will increase the final compressed file size. Your discussion makes me convinced that you are confusing the notions of compression with packing.

john

jkominek · Post by **jkominek** » Wed Aug 06, 2008 7:54 pm

syzygy wrote:
Kirill Kryukov wrote:I can understand that there is much interest in theoretical aspects of EGTB, especially among the theorists gathering here. But please consider that we need something very practical and simple for wide adoption. And we need wide adoption to compute the important 7-men tables sooner.
If it turns out that DTZ50 extended with DTZ (in a single table) takes hardly more time (for generation) and space (for storage), I don't see any reason not to go for that.

I do. Difficulty of writing correct code, complexity of the file format, ease of use, and uncertainty of validation. In the simple-versus-complicated debate I side solidly with Kirill.

jk

jkominek · Post by **jkominek** » Wed Aug 06, 2008 8:25 pm

Kirill Kryukov wrote: Vincent brought some good points here about the WDL. What I did not realize at first is that WDL and DTZ50 are totally interchangeable during the generation. If you have the comlpete set of WDL tables, you can generate ANY DTZ50 table you wish. It makes perfect sense to use WDL tables necessary for generation, even when generating a DTZ50 table. This way they will fit in RAM which will help a lot.

Yes, provided - following guyhaw - that we tidy up our nomenclature to 'WDL50'. Building a DTZ50 table from a rule-unhindered WDL table will produce incorrect results. (Except for tables where the longest phase skirts the 50 move rule.)

It works, though, because a zeroing move is an "information loss point". DTZ only cares about the length of the current phase and the game value at the end of it. The only information that need be retrained across this point is the WDL value.

guyhaw wrote:I agree with KK that WDL EGTs should be produced, but _not_ WDL50 EGTs. Note that "Generating a WDL from a DTZ50 EGT" produces a WDL50 EGT.

Please, no. syzygy's proposed innovation notwithstanding, keep the DTZ/WDL and DTZ50/WDL50 pairs together.

john

syzygy · Post by **syzygy** » Wed Aug 06, 2008 8:38 pm

jkominek wrote:I do. Difficulty of writing correct code, complexity of the file format, ease of use, and uncertainty of validation. In the simple-versus-complicated debate I side solidly with Kirill.

In my opinion only "difficulty of writing correct code" is a relevant argument. The other aspects you mention do not seem to get more complicated from using DTZ50+ (e.g. the file format does not need changing, see my previous two messages).

And if you really prefer "simple", forget about 7-piece tables

jkominek · Post by **jkominek** » Wed Aug 06, 2008 8:49 pm

Kirill Kryukov wrote:Competition format: prize (several complete solutions may compete in performance and features) vs bounty (the first successful solution wins).

While this thread has generally veered off into discussions of what kind of tablebases to construct, I am reminded that there are many open issues. For example, will there be a steering committee and how does one apply for membership? Okay, there is no organization to apply to, so Kirill gets to be benevolent dictator for a while yet. Enjoy it while it lasts.

If I may so suggest, there should be a definitive way of declaring that the competition is officially open, and that it is (temporarily) closed for evaluation of entrants. There should also be a formal mechanism for an interested party to declare an intent to enter, to retract this intent if necessary, and to submit an actual entry. Declaring an intent to enter makes the person, or team, ineligible to be on the steering committee. Vice versa as well. It also gives the steering committee some advance warning of what to expect.

It'd probably be worth studying analogous competitions to gleem useful ideas and practices. The NIST encryption competition comes to mind: http://www.networkworld.com/news/2007/0 ... rithm.html

I'd prefer that you not charge an entry fee.

john

jkominek · Post by **jkominek** » Wed Aug 06, 2008 9:41 pm

syzygy wrote:
jkominek wrote:I do. Difficulty of writing correct code, complexity of the file format, ease of use, and uncertainty of validation. In the simple-versus-complicated debate I side solidly with Kirill.
In my opinion only "difficulty of writing correct code" is a relevant argument. The other aspects you mention do not seem to get more complicated from using DTZ50+ (e.g. the file format does not need changing, see my previous two messages).

Oh it does. The "+" implies there is no bound on the distance and one must accommodate single and double-byte entries. So in that regard the simplicity gained in going from DTM to DTZ50 is lost.

And if you really prefer "simple", forget about 7-piece tables

Okay, I'll think about 8.

jk

syzygy · Post by **syzygy** » Wed Aug 06, 2008 11:02 pm

jkominek wrote:Oh it does. The "+" implies there is no bound on the distance and one must accommodate single and double-byte entries. So in that regard the simplicity gained in going from DTM to DTZ50 is lost.

True, but that holds for DTZ as well.

For the file format itself it shouldn't matter a lot, but it may complicate parts of the generator and increase the size and/or number of intermediate files. So I'd put it under "difficulty of writing code" rather than the (final) file format.

jkominek · Post by **jkominek** » Thu Aug 07, 2008 2:29 am

syzygy wrote:
jkominek wrote:Oh it does. The "+" implies there is no bound on the distance and one must accommodate single and double-byte entries. So in that regard the simplicity gained in going from DTM to DTZ50 is lost.
True, but that holds for DTZ as well.

Indeed it does - a factor to weigh into the consideration of requested measure.

syzygy wrote:For the file format itself it shouldn't matter a lot, but it may complicate parts of the generator and increase the size and/or number of intermediate files. So I'd put it under "difficulty of writing code" rather than the (final) file format.

It certainly is that. I also argue that, as a consequence, validation becomes more tricky. There are no existing tables of exactly the same kind to cross-check against. I've been expecting guyhaw to weigh in on this matter with a wavy-pointy finger, but he so far has not.

syzygy wrote:Btw, my notation DTR(k) (for those that are still with me ) should be changed into DTZ(k). Actually, it also possible to talk about DTC(k) and DTM(k).
DTZ(k): pairs (min{i : p is won under k_i-move rule}, distance to zeroing move)
...
Kronsteen's table = DTZ(50<inf).

It's not typical for me to discourse negatively ... but, I'm sorry, your notation sucks. For all positions p in the chess space X, DTZ(k) is a function of the independent variable k, where k is any positive integer, but for the purpose of finiteness we can limit it to be lower than say 2^15. The function covers all k-rule results and maps a position to a 2^15-dimensional vector. As k decreases DTZ(k,p) goes up. Until, at k=DTR(p)-1 the DTZ value becomes undefined. Or infinite if you wish: a draw is a "win at infinity". I read "DTZ(50<inf)" as a function of a boolean variable, and I don't know what that means at all.

jk

Post by **Kirill Kryukov** » Thu Aug 07, 2008 6:25 am

Interesting discussion. It seems there is significant camp of those who don't accept the reality that chess is played with 50-move rule.

I want to point out one thing.. Above I suggested an accelerated path to 7-men tables, by computing WDL (I should have called it WDL50, as Guy corrected me) most of time, and computing DTZ50 only for interesting or important endings. This is a very important point. WDL50 should an order of 100 times less than DTZ50. This will allow efficient and easy exchange of WDL50 files, allowing for fast distributed computation by any number of people.

Suppose someone wants to join and help to generate the next table on the list. He has to download the lesser tables first. Downloading DTZ50 tables (or any DT*) tables may take him longer than the generation itself. So effectively generation slows down greatly. Downloading WDL50 will be much faster and he will start computation sooner. The newly completed EGTB files can be in turn used faster by someone else, so this scheme greatly accelerates the progress. This way we can theoretically even afford double generation of each table, to make sure there are no errors (caused by faulty hardware for example).

This is a huge advantage over DTM, where the complete DTM tables have to be exchanged and used in the next generation step. Using double metric scheme (WDL50 + DTZ50) may move the complete 7-men tables ETA within just a several years.

From the point of computation time and space consumption there should be no huge difference between (WDL50+DTZ50) and (WDL+DTZ). However trying to combine them all in one "combined" EGTB format will complicate the code too much, I am afraid. Generation code, verification code, access code will all become more complex. The pro-50-move-rule camp will keep complaining about the bloat in the code and tables. Please note that it is this camp (pro-50-move-rule) that needs the highest access speed possible. Note that generating 7-men tables and compressing them efficiently (still allowing for fast decompression) is already a big challenge. Aditionally, the "combined" format will be suboptimal for both camps (50-move and no-50-move), due to increased size and slower access.

Recognizing the significance of the no-50-move-rule camp, I suggest the following plan. (WDL50 + DTZ50) is the basic format, allowing for fast progress towards the important pawnful endgames. (WDL + DTZ) is computed for only those position where Z is more than 50, and saved in extra, supplimentary files. This format may be called WDL(>50) and DTZ(>50). This way the WDL50 and DTZ50 will be optimized for the fastest possible access necessary for search and other applications. At the same time (WDL+DTZ) information will be still available to those who need it. Keeping (WDL+DTZ) values for only positions with Z>50 means that these supplimentary tables can be very compact. (Wins and losses in less than 50 moves are stored as draws).

Since WDL(>50) and DTZ(>50) are supplimentary to the WDL50 and DTZ50, they can be added at any point later. The community will have full freedom to add those tables later if the interest is big enough. In the mean time, WDL50+DTZ50 will be being computed, as important tables for both camps.

If we were to use a real WDL and DTZ tables in addition to WDL50 and DTZ50, this would roughly double the amount of data to store and exchange. By using WDL(>50) and DTZ(>50) the total data size will be greatly reduced. For example, a complete WDL50 may be a few terabytes. A complete WDL would be another few terabytes. However a complete WDL(>50) should take a couple hundred gigabytes at most.

So, if we were to split the resources into separate WDL50+DTZ50 and WDL+DTZ branches, the ETA for both will be longer than in my plan. The advantage of my proposal is that the amount of data to exchange is much smaller. Because WDL50+DTZ50 is used by both camps. The additional WDL(>50) and DTC(>50) tables are very small.

Please note again the importance of WDL*+DTZ* duality, because it will allows for hugely accelerated computation and file exchange.

I therefore think that at the current point it would be enough to require a WDL50+DTZ50 - capable generator. We will still have to re-generate all 3-to-6 men tables before moving on to 7-men, which will take some time. In this meantime the composers can be happy enough with their current DTM tables that don't consider the 50-move rule. Additional capabilities can be built in into the generator and probing code later (the source will be open).

Yeah these can be the names of two camps: "players" (50-move-rule) and "composers" (no-50-move-rule).

I realize that no solution can make everyone equally happy, but I feel that my proposal is the closest so far to making everyone happy, while still being practical.

Thanks everyone for the discussion! Any additional input is welcome as always!

Post by **Kirill Kryukov** » Thu Aug 07, 2008 8:44 am

I realized that "bounty" may not be the optimal competition format. This computation will span years, taking terabytes of disk space. Definitely we want to use the best generator and format (technically) we can get. We probably don't want to reward just the first entry simply for being first, when a technically superior generator may be just weeks away.

Certainly there won't be any entry fee.

The evaluation board is still a question, as well as the fund keeper. This all needs some more thought.

Post by **Kirill Kryukov** » Thu Aug 07, 2008 8:59 am

jkominek wrote:
Kirill Kryukov wrote:Competition format: prize (several complete solutions may compete in performance and features) vs bounty (the first successful solution wins).
While this thread has generally veered off into discussions of what kind of tablebases to construct, I am reminded that there are many open issues. For example, will there be a steering committee and how does one apply for membership? Okay, there is no organization to apply to, so Kirill gets to be benevolent dictator for a while yet. Enjoy it while it lasts.

If I may so suggest, there should be a definitive way of declaring that the competition is officially open, and that it is (temporarily) closed for evaluation of entrants. There should also be a formal mechanism for an interested party to declare an intent to enter, to retract this intent if necessary, and to submit an actual entry. Declaring an intent to enter makes the person, or team, ineligible to be on the steering committee. Vice versa as well. It also gives the steering committee some advance warning of what to expect.

It'd probably be worth studying analogous competitions to gleem useful ideas and practices. The NIST encryption competition comes to mind: http://www.networkworld.com/news/2007/0 ... rithm.html

I'd prefer that you not charge an entry fee.

john

Thanks for the input! Yes I now think that a formal competition time frame is better than "first submissions wins all" format. Time frame will then become a question. It's a balance between "how much we are in a hurry" and "how good a new format and generator we want to see".

A timeline can be, for example: Till the end of summer to decide the format and requirements, till the end of the year for submission, one or two months for evaluation, and then the fun begins..

If it's too long or too short, please post.

CCRL Discussion Board

7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Variants of the 50-move rule

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty

Re: 7-men EGTB Bounty