7-men EGTB Bounty

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

WDL and DTZ, WDL50 and DTZ50

Post by guyhaw »

I am not sure what the concern is here.
By the way, to avoid possible misunderstanding, the established notation is 'DTZ50', not 'DTZ(50)'. 'WDL(50)' is used for 'WDL or WDL50'.

Provided you have a good indexing-system for the positions (and I see that as the key issue), WDL EGTs can be generated regardless of the k-move rule.
This is what Jonathan Schaeffer did for Checkers, which is now enabling his proof that Checkers is a theoretical draw.
JS went to 10-man EGTs. Of course, advancing every checker in Checkers is irreversable, like pushing a Pawn in Chess. Not so with the Kings though.
The WDL EGTs are also useful in generating other EGTS such as DTC, DTM and DTZ in that they have already identified the draws (and losses).

Two points maybe worth noting. The generation of an EGT starts off by identifying all stm-mated positions _plus_ all stm-forced-to-move-to-loss-in-next-phase. All these are depth zero, counting in winner's moves. There is another table, which is not WDL50 but helps generate WDL50, which is the 'WDL50trunc' EGT created after 100 cycles after identifying stm-mated/(forced to move to loss in next phase). These positions will have DTZ =< 50 and no others will. It is useful to 'lop the head off' WDL.

Just as WDL EGTs help generate the DTZ EGTs, they (or WDLktrunc) help generate the WDLk (e.g. WDL50) EGTs. They have already identified _some_ draws.
But the WDL50 EGT for an endgame can only be generated once the WDL50 EGTs for its sub-endgames are known. Example KBBKNN needs KBBKN et al.

As kronsteen says, you might use the KPPPKPP DTZ EGT to navigate to 6-man and/or conversion, and then rely on 7-man WDL EGTs thereafter.
The integrity of WDL50 is an issue. It looks very much like WDL49 and WDL51 but one could check that the 'boundary converts' are to a WDL50 EGT.
I think MB or someone once said that the issues tend to be at the 'edges' of the EGT, not in the routine within-endgame retro'ing.
An example of this was the FEG-error, spotted by MB and now cured, in missing some shallow wins in a subgame (KNNK I think).

g
syzygy
Posts: 166
Joined: Sun Aug 03, 2008 4:02 pm

Re: 7-men EGTB Bounty

Post by syzygy »

kronsteen wrote:
syzygy wrote:But I'm not sure how that helps much, as you'll need the subtables in DTZ50 if you want to use kpppkpp in DTZ50.
Wrong. DTZ(50) needs only WDL(50) for daughter endings, not DTZ(50).
That's exactly what I wrote:
syzygy wrote:I think the idea is that to generate kpppkpp in DTZ50, you only need all subtables in WDL50.
However, to usefully *use* a DTZ50-table for kpppkpp it is not sufficient if you only have the subtables in WDL50. Some won endings will be drawn and some drawn endings will be lost if you only have WDL50 subtables.
This is the main strength of the “dual metric” approach : as soon as you have 7-men full set in WDL(50) format, you can compute DTZ(50) table of any 7-men ending you want (even kpppkpp first if it is your wish), regardless of other DTZ(50) tables that already exist.
I know, but if we agree that all (or at least most) DTZ50-tables have to be computed at some point anyway, why would you want to generate DTZ50 for kpppkpp before DTZ50 for the subtables?

DTZ50 and WDL50 can be distributed separately. Systems for generating DTZ50 only need to get the WDL50-subtables.
Even if a future generator is designed to build DTZ(50) and WDL(50) tables at the same time (WDL(50) being a “sub product” of DTZ(50)), keeping only WDL(50) first due to space storage limitations (and waiting for hardware upgrade for DTZ) might be a sensible approach, as DTZ(50) tables are something like 5 times bigger, and their size can be hundreds of GBs.
Of course. But how does that make it a good idea to generate WDL50 separately from DTZ50? When generating both together there's nothing stopping others from downloading just the WDL50 so as to be able to generate the next DTZ50-table.
The fact is DTZ(50) info will not necessarily be sustained everywhere from initial position to mate, but only in “interesting” phases. Taking kpppkpp as an example, if you succeed in converting to kqppkpp, maybe this ending will be kept in WDL(50) format as WDL info added to your playing ability will be enough to win this in general. But should this ending convert to kqppkqp, you may want DTZ(50) info back as this is a difficult ending. This is perfectly possible.
This is true, for some tables only WDL50 has to be kept. If a separate generation of a WDL50-table can give information on maxDTZ50 for that table and maxDTZ50 is sufficiently small, then the generation of DTZ50 can be avoided for that table. So that could be a benefit. Still I'm not impressed by the idea of generating WDL50 many years before DTZ50. As a temporary solution to have something useful while waiting for 100TB hard drives, WDL seems a lot more interesting. (Another option would be 5-state WDL50+, if the difference in size turns out to be small enough.)

Btw, for tables with pawns you can look at maxDTZ for each pawn slice. Most likely the majority of slices will have small maxDTZ. It should be possible to leave out those slices from a DTZ50-table and let the engine revert to the corresponding WDL50-slice.
kronsteen
Posts: 88
Joined: Fri Aug 01, 2008 11:20 am
Sign-up code: 0
Location: France

Re: 7-men EGTB Bounty

Post by kronsteen »

syzygy wrote:As a temporary solution to have something useful while waiting for 100TB hard drives, WDL seems a lot more interesting. (Another option would be 5-state WDL50+, if the difference in size turns out to be small enough.)
Considering that :
1) WDL alone is far more useful than WDL50 alone
2) WDL50 will be required for DTZ50 generation which is a very desired feature
3) WDL50+ (WDL + WDL50 together) will almost surely be of negligible extra size
4) WDL (or WDL50, WDL50+) will be the only affordable format when we get data storage capacity of 10 TB class

WDL50+ is definitely a format we should aim at, if its generation is not too hard. Presentation of WDL50+ tables could be in the form of a 5-state table, but another and maybe better practical idea would be standard WDL with a bitbase add-on on wins/losses with info "0 = unaffected by 50 move rule, 1 = affected ('1' means that a win is indeed a cursed one, or a loss is a blessed one). These add-ons will be of very small size, and more importantly, their owning and their probing could be optional according to user's needs and preferences.
syzygy wrote:Btw, for tables with pawns you can look at maxDTZ for each pawn slice. Most likely the majority of slices will have small maxDTZ. It should be possible to leave out those slices from a DTZ50-table and let the engine revert to the corresponding WDL50-slice.
Hmmm. It is not just because DTZ is small that winning line is easy to find. In kqpkq for example, preliminary required wq moves before touching the pawn in order to preserve a difficult theoretical win thereafter are sometimes rather mysterious, and even at DTZ depth of 3 or 4 one can easily go astray with only WDL support (ok this example is not perfect as I think that no kqpkq pawn slice has small maxDTZ, but it may well be the case in kbppkbp for example, with the problem above remaining). Maybe one could find rules about what endings and what pawn slices with small maxDTZ are worth being kept in WDL format, but it sounds rather complicated and I wonder if it is worth the effort. Also, cutting a TB into pawn slices might be impractical if the indexing scheme is not built upon this notion, and it is probably unwise to require anything other than "best possible expected performance" for indexing scheme.
syzygy
Posts: 166
Joined: Sun Aug 03, 2008 4:02 pm

Re: 7-men EGTB Bounty

Post by syzygy »

kronsteen wrote:Hmmm. It is not just because DTZ is small that winning line is easy to find. In kqpkq for example, preliminary required wq moves before touching the pawn in order to preserve a difficult theoretical win thereafter are sometimes rather mysterious, and even at DTZ depth of 3 or 4 one can easily go astray with only WDL support (ok this example is not perfect as I think that no kqpkq pawn slice has small maxDTZ, but it may well be the case in kbppkbp for example, with the problem above remaining).
The required moves may be mysterious, but if maxDTZ is low, the engine will be able to find a winning line (to a winning zeroing move) by searching at most maxDTZ ply.
Also, cutting a TB into pawn slices might be impractical if the indexing scheme is not built upon this notion, and it is probably unwise to require anything other than "best possible expected performance" for indexing scheme.
The indexing scheme used for generation will be built upon this notion. Otherwise the generator will not have the ability to generate the table slice by slice, which will lose a lot of performance. Also DTZ almost demands a slice by slice approach.

The indexing scheme for the compressed format may be (and possibly should) be a different one. However, it seems to make a lot of sense for many reasons to also base that scheme on pawn slices. In a way the different slices have as much to do with each other as with any of the subtables.
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

A further benefit of WDL(50) EGTs

Post by guyhaw »

Not following every detail of this kronsteen-szyzygy rally. But a reason why one might go WDL(50) some way before kpppkpp, at which point one goes DTZ(50) is that ... with a 'sliced-EGT' index ('placing' both sides' Pawns first), the endgames with fewer Pawns have bigger slices. This means that the indivisible KQRNKQR woudl be very inconvenient to do as DTZ(50) but WDL(50) would be more tractable.

[ I think MB might have already done the KPPPKPP~ EGT, the '~' indicating that conversions are only to Queens. ]

Given that one has a WDL EGT, only a delta-WDL/WDL50 EGT is required, noting the differences between WDL and WDL50. In fact, I think someone suggested using the 4th bit for 'decisive but beyond 50 moves' (but then it's not clear whether it is a win or a loss).

Not sure with modern disc technology whether Hyatt's establish 8KB blocksize for Nalimov EGTs is still optimal. Would seem a pity to shrink 8KB of delta-WDL/WDL50 EGT down to next to nothing because of its sparness, and then to be bringing in only those small bits on each disc transfer.

g
syzygy
Posts: 166
Joined: Sun Aug 03, 2008 4:02 pm

Re: A further benefit of WDL(50) EGTs

Post by syzygy »

guyhaw wrote:But a reason why one might go WDL(50) some way before kpppkpp, at which point one goes DTZ(50) is that ... with a 'sliced-EGT' index ('placing' both sides' Pawns first), the endgames with fewer Pawns have bigger slices. This means that the indivisible KQRNKQR woudl be very inconvenient to do as DTZ(50) but WDL(50) would be more tractable.
True, this might allow an earlier generation of DTZ50 for kpppkpp. Still, I don't see why one would be happy with a DTZ50-table for kpppkpp without also having the (non-trivial) subtables in DTZ50.

Without the subtables in DTZ50 there is no guarantee that a won position can be actually won. Then you might as well be happy with WDL50 or WDL for kpppkpp.
[ I think MB might have already done the KPPPKPP~ EGT, the '~' indicating that conversions are only to Queens. ]
However tempting, that sounds like something to be avoided if the aim is to generate the full set. Freely quoting a post by Kirill early in this thread, focussing first on the highlights will probably have the effect that the remainder is never generated.
Given that one has a WDL EGT, only a delta-WDL/WDL50 EGT is required, noting the differences between WDL and WDL50. In fact, I think someone suggested using the 4th bit for 'decisive but beyond 50 moves' (but then it's not clear whether it is a win or a loss).
You probably mean "the 4th 2-bit combination". In my view it is not a good idea. Using 5 states instead of 4 is not a problem when using good compression and solves the problem you mention.

I don't know if it is better to compress WDL (or WLD50) and its delta with WDL50 (or WDL) separately, or if it is better to compress the information as one 5-state table. I suspect the latter, but without experimentation that's only speculation.
Not sure with modern disc technology whether Hyatt's establish 8KB blocksize for Nalimov EGTs is still optimal. Would seem a pity to shrink 8KB of delta-WDL/WDL50 EGT down to next to nothing because of its sparness, and then to be bringing in only those small bits on each disc transfer.
Assuming Hyatt's experimental results still hold for Nalimov tables, WDL-tables and especially delta-WDL-tables in Nalimov format most likely require a larger blocksize than 8KB for optimal results, simply because they compress much better.

However, note that if the problem is the small bits on each disk transfer, a simple solution would be to get multiple consecutive compressed blocks from disk at once. This is not something that Hyatt could experiment with, as he could not or did not modify Nalimov's probing code. But it is something that can easily be done (in free code).

So I guess blocksize can be more or less independent of disk technology, as long as blocks stay relatively small. More important seems to be the trade off between compression efficiency and decompression time, larger blocks compressing better but taking more time to decompress. Another factor is whether you cache compressed blocks or uncompressed blocks, compressed blocks eating less RAM but requiring more decompression time. Obviously decompression time depends greatly on the compression algorithm used. Another factor is the size of the index tables required for finding where blocks start.
kronsteen
Posts: 88
Joined: Fri Aug 01, 2008 11:20 am
Sign-up code: 0
Location: France

Re: 7-men EGTB Bounty

Post by kronsteen »

syzygy wrote:The indexing scheme used for generation will be built upon this notion. Otherwise the generator will not have the ability to generate the table slice by slice, which will lose a lot of performance. Also DTZ almost demands a slice by slice approach.
Indexing scheme must be kept free, but if cutting TBs in pawn slices is really the only sensible solution for best performance, then so be it. It may allow some interesting practical ideas such as have DTZ on kpppkqp if a white pawn is on 7th rank and WDL if not. It may also allow progressive generation of DTZ tables beginning with positions with most advanced pawns, and keeping for some time positions with pawns further back on WDL format. Another question is if indexing scheme must be the same for WDL and for DTZ. IMO it must, and in this case cutting into pawn slices, which is essentially driven by DTZ structure, has also to apply to WDL.
syzygy wrote:The indexing scheme for the compressed format may be (and possibly should) be a different one. However, it seems to make a lot of sense for many reasons to also base that scheme on pawn slices. In a way the different slices have as much to do with each other as with any of the subtables.
Yes, I also understand this.
syzygy wrote:True, this might allow an earlier generation of DTZ50 for kpppkpp. Still, I don't see why one would be happy with a DTZ50-table for kpppkpp without also having the (non-trivial) subtables in DTZ50.
Easy : this is because some endings are far more likely than others to occur in practical games. Ask the question to strong chess players “on what 7-men endings would you like to get perfect knowledge in order to improve your game” I would bet that krppkrp would come first, followed by endings such like kbppkbp, knppkrp, kpppkpp, kpppkbp, etc… and, on the other hand, every pawnless ending would be essentially seen as worthless as they almost never occur. A practical player would certainly be glad to get krppkrp DTZ knowledge but would certainly not bother with kqrbkqr DTZ knowledge which is a possible and non trivial outcome, because it’s so unlikely. I’m also pretty sure that many players with krppkrp would not mind to live without kqqrkqr / kqrpkrp / krppkqr DTZ, and maybe even kqrpkqr DTZ. Statistics about what endings occur most frequently on practical games exist (full statistics on 6-men can be found somewhere in this forum), and it is my guess that the 5% most likely 7-men endings (50 out of 1001) would cover more than 90% of the games. If at some point we get full WDL 7-men set and are up to start DTZ, having those statistics in order to target most likely endings first would be a possible and sensible approach, as it would be best suited to go for quick results for practical players (which form the biggest contingent of people interested in TBs, I believe).
syzygy wrote:[ I think MB might have already done the KPPPKPP~ EGT, the '~' indicating that conversions are only to Queens. ]

However tempting, that sounds like something to be avoided if the aim is to generate the full set. Freely quoting a post by Kirill early in this thread, focussing first on the highlights will probably have the effect that the remainder is never generated.
I’m with syz and KK on this too. kpppkpp~, however close to kpppkpp it is, is “impure” knowledge, due to the fraction of data cooked by underpromotions. It might be good to show some highlights quickly, but only if they are part of overall project (kpppkpp~ is not). Following this idea, I also recommend against other possible attempts for specific partial knowledge generation, such as kxxpkxp with blocked pawns with a code deliberately exploiting the specific fact that in this case next non-zeroing move is forcefully conversion to 6-man. Such achievements are interesting on their own, but misdirected according to the overall project if they require effort to write specific generating code which won’t be used any more when moving to full TBs.
syzygy wrote:I don't know if it is better to compress WDL (or WLD50) and its delta with WDL50 (or WDL) separately, or if it is better to compress the information as one 5-state table. I suspect the latter, but without experimentation that's only speculation.
WDL + a bitbase (0/1) identifying positions affected by 50-move rule seems better than WDL50 + a 3-state base (0/1/2) giving precise status of drawn positions (genuine draw / cursed win / blessed loss). WDL50 + a bitbase would not be enough for cursed win / blessed loss discrimination. Note also that, provided that probing the bitbase is an optional feature according to user’s preferences, WDL and the bitbase could use different compression schemes for more efficiency.
syzygy
Posts: 166
Joined: Sun Aug 03, 2008 4:02 pm

Re: 7-men EGTB Bounty

Post by syzygy »

kronsteen wrote:Indexing scheme must be kept free, but if cutting TBs in pawn slices is really the only sensible solution for best performance, then so be it.
I don't intend my remarks as hard requirements, but as ideas of how to get optimally functioning tables (so I might strictly speaking be off-topic). I think it is quite probable that a well-engineered generator for pawn tables will have an indexing scheme based on pawn slices.
Easy : this is because some endings are far more likely than others to occur in practical games. Ask the question to strong chess players “on what 7-men endings would you like to get perfect knowledge in order to improve your game” I would bet that krppkrp would come first, followed by endings such like kbppkbp, knppkrp, kpppkpp, kpppkbp, etc… and, on the other hand, every pawnless ending would be essentially seen as worthless as they almost never occur.
Certainly. However, playing one of these "interesting" endings using DTZ50 to a mate, there is no guarantee that none of the "not-interesting" tables show up on the board. Statistics on practical games imo say nothing on what subtables are needed to play out a DTZ-winning line.
If at some point we get full WDL 7-men set and are up to start DTZ, having those statistics in order to target most likely endings first would be a possible and sensible approach, as it would be best suited to go for quick results for practical players (which form the biggest contingent of people interested in TBs, I believe).
With as a possible result that just 5% of the tables are ever generated, and that those tables cannot be played perfectly because of missing non-trivial DTZ50-subtables.
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

JdK's FEG and 7-man EGTs

Post by guyhaw »

Does Johan de Konig's FEG not already generated 7-man EGTs if you have a big enough machine?
g
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

Ideas inspired by FEG ...

Post by guyhaw »

Continuing to look at what's on the web about Johan de Koning's FEG (EGT Generator), I note that:

1) FEG stored wins for 'White' and wins for 'Black' in separate files, merely noting that a position was 'not a win' when it was not a win for the relevant side.
2) Once one's engine has determined from the WDL EGT that it has a win, it is a nice economy not to be heaving around depths about losses
3) FEG could operate with only the 'wtm' or 'btm' half of the relevant file, but this would call for a less than arduous 2-ply search from the chess-engine
4) This would be true re chess-engines and Nalimov's EGTs, provided the chess-engine is programmed to deal with this: are there such engines out there?
5) This 'missing positions' idea can be generalised to create a continuum of EGTs between WDL and (say) DTZ (or DTM if you prefer) ...

6) An EGT could hold depth values only for positions whose depth is exactly 2d, 4d, 8d, ... nd, ... etc. As 'n' increases, the EGT-regime 'converges' on WDL.
If the EGT only holds wtm wins in 2d, the chess-engine has to do a 4-ply search to find the minimax route to the next target position, wtm depth 2(d-1).
There is an exchange relationship between 'acreage of EGTs' and 'runtime computation required' which might, pragmatically, be useful.
Also, if all depths are divisible by 2, this can be noted in the EGT header [another item for the self-describing EGT header], and we save 1 bit per cell.
... so 'granularity of depth-recording' is a proposed requirement for the EGT header.
Other positions are set to 'not win in depth exactly divisible by 2', assuming WDL is doing all the win-draw-loss identification.

7) I wonder what is saved in space terms here: an experiment with Nalimov EGT's might be quite interesting.
Going with DTZ rather than DTM halves the acreage. Ditto having only wtm files, and there are more space-savings as above.
Provided the WDL EGT is giving the basic 'you have a win' information, I don't see a down-side in the chess-engine searching process.

Incidentally, does anyone know about the old, and now sorted, 'KNNK bug' and the 'transparent pawn' bug in FEG?

g
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: DTR ... food for thought

Post by Kirill Kryukov »

guyhaw wrote:I attach here, zipped (as .pdf is unacceptable!)
PDF is from now aceptable as attachment. Sorry for inconvenience.
JVMerlino

Re: Ideas inspired by FEG ...

Post by JVMerlino »

guyhaw wrote:Incidentally, does anyone know about the old, and now sorted, 'KNNK bug' and the 'transparent pawn' bug in FEG?
The documentation for FEG v3.03c details those two bugs as follows:

--------------------------
2002-Nov-18 version 3.03b : KNNK bug fixed.

In some EGDBs btm might lose in 0 ply (ie is checkmated), but not in 2
or more ply. Examples: KBKx, KNKx, KNNK. In these cases version 3.03a
"forgot" to store the mate-in-0 positions.

During use mate-in-0 is detected, but during generation it isn't.
Hence positions mistakenly marked as a draw might propagate to 5-man
(and eventualy 6-man) EGDBs. Though this happens rarely, and mostly
affects short mates in contrived positions, KNNKP is the exception.
Eg: Kf2 Nf1 Ne8, Kh1 Pg5 was marked as draw, but Nf6 is mate in 3.

The list of all files affected to some degree is quite long.
KNNK KBKB KBKN KBKP KNKR KNKB KNKN
KNNKQ KNNKR KNNKB KNNKN KNNKP KNPKQ KNPKN KNPKP KBKQB KBKQN KBKQP KBKRB
KBKRN KBKRP KBKBB KBKBN KBKBP KBKNN KBKNP KBKPP KNKQR KNKQB KNKQN KNKRR
KNKRB KNKRN KNKRP KNKBB KNKBN KNKBP KNKNN KNKNP KPKQR KPKRR KPKRB

Tip: to determine the version of an existing EGDB file, type eg
FEG -t KNNKP
and look for the line starting with "creator".
Pressing CTRL+Q will abort the test if it takes too long.

2002-Dec-26 version 3.03c : Transparent Pawn bug fixed.

Pawns subject to en passent were missing while generating backmoves
(for finding eg 5 ply wins from 4 ply losses). As a result sliding
pieces could move through that missing pawn. A rather funny example
is Kc8 Pc5 Pg7, Ka8 Pd7, which was mistakenly generated as mate in 2
based on 1. g8Q d5 2. Qa2#.

Affected files are the ones where both sides have at least a Pawn and
White has at least a sliding piece. As well as the ones relying on
them via promotion (KPPKP) or capture (6-man only) or both.
Regarding 3,4,5-man these are KQPKP, KRPKP, KBPKP, KPPKP.

Many thanks are due to Marc Bourzutschky for discovering the KNNK bug,
and for locating the TP bug. As well as for pioneering generation of
6-man data, providing statistics and suggestions and data.
--------------------------

jm
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

Re the FEG errors

Post by guyhaw »

Thanks, jm, for the reminder about the shallow missed wins. I never new about the 'Transparent Pawn' bug.
g
Post Reply