I create both WDL and DTZ in one go, so I don't use WDL in the creation of DTZ. The algorithm used is the grandfather algorithm with 2 plies per iteration (I think HGM calls this leapfrogging, but I might be wrong). I tried the outcounting method, but it didn't seem to be competitive (and it makes things more complicated).guyhaw wrote:01) You are using value-only 'WDL-style' EGTs. I had advocated the creation and use of these and indeed, the use of WDL EGTs in the creation of DTx EGTs as this avoids examining potential forced-losses which are in fact draws.
I believe the terminology is from kronsteen (on this forum). My generator calls them both "cursed" in the statistics that are output, because I forgot about the "blessed loss" term until recently.04) You are extending the idea of 'WDL' EGTs to 'WDL+' EGTs (or strictly, to a WDL50+ EGT) where I think you have five values, WIn/Draw/Loss (under the 50-move rule) plus:
- C ... 'Cursed Win', i.e. would be a Win without the 50-move rule but is not a win under the 50-move rule
- B ... Blessed Loss', i.e. would be a Loss without the 50-move rule but is not a Loss under the 50-move rule
A pure WDL/DTZ pair is not of much use for creating WDL50+/DTZ50+. I create tables in RAM that have all the information necessary for WDL50+ and DTZ50+, then permute them to different indexing schemes and compress. I do test runs on subsets of the data to find good permutations. (The idea to try permutations is from Jesper Torp Kristensen's master thesis.)05) It follows that before one can create a WDL50+/DTZ50+ pair of EGTs, one has to create a WDL/DTZ pair of EGTs.
Exactly. And I go one step further by sorting each of the 4 ranges by frequency and remapping. So a '7' in the DTZ50+ would correspond to say the 7th most frequent win-in-x or loss-in-x or cursed-win-in-x or blessed-loss-in-x value. This further improves compression.06) As per point 03, the accompanying 'DTZ50+' EGT need not carry information which can be inferred from the WDL50+ EGT. Thus, a '7' in the DTZ50+ can mean, as indicated in the WDL50+ EGT:
- 7 winner's moves to either a DTZ50-win or a DTZ50-loss in the next phase of play (after a Pawn-move and/or capture),
- a draw (ignoring the 50-move rule) ... where '7' just happens to be the most convenient value to put there from compression considerations,
- (if WDL50+ has a 'C' or a 'B' against this position) a win/loss in absolute terms, but 7-moves from 'something' given the 50-move rule
No, the client just sends a position and the server-side probing code performs the 1-ply search.07) You are saving at least half the size of the EGT by only keeping the smaller stm-EGT, at the cost of requiring the chess-engine to do an extra one-ply search. Thus, if the EGT were remote, e.g. on the web, this would multiply the communication cost by 'm' where 'm' could be quite large.
Of course the 1-ply search creates probing overhead, but the idea is to let an engine only probe the DTZ tables at the root anyway.
Correct. It's a variation on the RE-PAIR scheme.08) You are using a compression-scheme which is not that of Andrew Kadatch as in the Nalimov EGTs. You say this is based on ideas of Larsson and Moffat, http://www.larsson.dogma.net/dcc99.pdf, and has Huffman-coding as a follow-up to iteratively creating an alphabet of symbols.
For sure it is quite different. I think Kadatch's scheme depends on maintaining a cache of decompressed blocks, which I wanted to avoid.I guess this proves to be better than Kadatch's scheme.
To improve access times they have gone from a block size of 2 MB to (I thought) 8 KB at the cost of a considerable increase in total compressed size, so decompression time was a limiting factor for them. To go smaller than 8 KB without losing too much compression efficiency a compression method is needed that uses a fixed dictionary (per table) that may be constructed on the basis of the whole table ("offline") instead of a dictionary that is constructed incrementally while decompressing a block ("online").Compression/decompression-time seems not to be an important parameter compared to disc-I/O time. I think the recent MVL 7-man initiative uses LZW-compression but they have not expounded on that much yet.
The compression ratio of my scheme is independent of block size, except that splitting into blocks at byte-boundaries necessarily loses a few bits at the end of each block.
The main disadvantage of my compression scheme is that compression is not very fast and that the whole table needs to be in RAM, as it sweeps through the table many times. The good thing though is that it can be parallelised very well and now that I think of it, it could be implemented in a distributed manner e.g. on Lomonosov-type systems where the table is distributed over the RAM of many nodes.
My tables include indexing tables. They are demand-mapped into RAM during table access, so initialisation is fast (just a check which files are present).09) However, and this is an issue with Nalimov EGTs as well, the size of the indexing table needs to be included in the effective size of the EGT, and maybe it has been in your case already. Initialising the Nalimov EGTs seems to take a long time.
I didn't find the number in the article, but I probably overlooked it.I will dig out my sub-6-man DTZ50 EGTs. It may be that the size I quoted was only for DTZ50 EGTs that are different from the DTZ EGTs and maybe the article does not make this clear.
Absolute truth is included as well, since one can interpret cursed wins as wins and blessed losses as losses. WDL50+ is really a superset of WDL (and of WDL50). DTZ50+ is a superset of DTZ50, but not of DTZ (since DTZ50 paths may be longer than DTZ paths), but still provides a (cursed) winning path for cursed wins.I would be interested in using your WDL/DTZ (rather than your WDL50+/DTZ50+) EGTs at some stage if that were convenient - as I am more interested in an 'absolute truth' not moderated by FIDE's current k-move rule.
Regarding "current k-move rule", I think it is unlikely that FIDE will change the 50-move rule in the future. The arguments used for temporarily changing the rule in the past have now been ruled to be of insufficient importance, and I don't see any new arguments arising. The number 50 is arbitrary, but it has been there for a long time now. Anyway, regenerating the 6-piece tables for another value of k is only a question of days.