New 6-piece tablebase generator

syzygy · Post by **syzygy** » Sat Apr 06, 2013 6:07 pm

guyhaw wrote:01) You are using value-only 'WDL-style' EGTs. I had advocated the creation and use of these and indeed, the use of WDL EGTs in the creation of DTx EGTs as this avoids examining potential forced-losses which are in fact draws.

I create both WDL and DTZ in one go, so I don't use WDL in the creation of DTZ. The algorithm used is the grandfather algorithm with 2 plies per iteration (I think HGM calls this leapfrogging, but I might be wrong). I tried the outcounting method, but it didn't seem to be competitive (and it makes things more complicated).

04) You are extending the idea of 'WDL' EGTs to 'WDL+' EGTs (or strictly, to a WDL50+ EGT) where I think you have five values, WIn/Draw/Loss (under the 50-move rule) plus:
- C ... 'Cursed Win', i.e. would be a Win without the 50-move rule but is not a win under the 50-move rule
- B ... Blessed Loss', i.e. would be a Loss without the 50-move rule but is not a Loss under the 50-move rule

I believe the terminology is from kronsteen (on this forum). My generator calls them both "cursed" in the statistics that are output, because I forgot about the "blessed loss" term until recently.

05) It follows that before one can create a WDL50+/DTZ50+ pair of EGTs, one has to create a WDL/DTZ pair of EGTs.

A pure WDL/DTZ pair is not of much use for creating WDL50+/DTZ50+. I create tables in RAM that have all the information necessary for WDL50+ and DTZ50+, then permute them to different indexing schemes and compress. I do test runs on subsets of the data to find good permutations. (The idea to try permutations is from Jesper Torp Kristensen's master thesis.)

06) As per point 03, the accompanying 'DTZ50+' EGT need not carry information which can be inferred from the WDL50+ EGT. Thus, a '7' in the DTZ50+ can mean, as indicated in the WDL50+ EGT:
- 7 winner's moves to either a DTZ50-win or a DTZ50-loss in the next phase of play (after a Pawn-move and/or capture),
- a draw (ignoring the 50-move rule) ... where '7' just happens to be the most convenient value to put there from compression considerations,
- (if WDL50+ has a 'C' or a 'B' against this position) a win/loss in absolute terms, but 7-moves from 'something' given the 50-move rule

Exactly. And I go one step further by sorting each of the 4 ranges by frequency and remapping. So a '7' in the DTZ50+ would correspond to say the 7th most frequent win-in-x or loss-in-x or cursed-win-in-x or blessed-loss-in-x value. This further improves compression.

07) You are saving at least half the size of the EGT by only keeping the smaller stm-EGT, at the cost of requiring the chess-engine to do an extra one-ply search. Thus, if the EGT were remote, e.g. on the web, this would multiply the communication cost by 'm' where 'm' could be quite large.

No, the client just sends a position and the server-side probing code performs the 1-ply search.

Of course the 1-ply search creates probing overhead, but the idea is to let an engine only probe the DTZ tables at the root anyway.

08) You are using a compression-scheme which is not that of Andrew Kadatch as in the Nalimov EGTs. You say this is based on ideas of Larsson and Moffat, http://www.larsson.dogma.net/dcc99.pdf, and has Huffman-coding as a follow-up to iteratively creating an alphabet of symbols.

Correct. It's a variation on the RE-PAIR scheme.

I guess this proves to be better than Kadatch's scheme.

For sure it is quite different. I think Kadatch's scheme depends on maintaining a cache of decompressed blocks, which I wanted to avoid.

Compression/decompression-time seems not to be an important parameter compared to disc-I/O time. I think the recent MVL 7-man initiative uses LZW-compression but they have not expounded on that much yet.

To improve access times they have gone from a block size of 2 MB to (I thought) 8 KB at the cost of a considerable increase in total compressed size, so decompression time was a limiting factor for them. To go smaller than 8 KB without losing too much compression efficiency a compression method is needed that uses a fixed dictionary (per table) that may be constructed on the basis of the whole table ("offline") instead of a dictionary that is constructed incrementally while decompressing a block ("online").

The compression ratio of my scheme is independent of block size, except that splitting into blocks at byte-boundaries necessarily loses a few bits at the end of each block.

The main disadvantage of my compression scheme is that compression is not very fast and that the whole table needs to be in RAM, as it sweeps through the table many times. The good thing though is that it can be parallelised very well and now that I think of it, it could be implemented in a distributed manner e.g. on Lomonosov-type systems where the table is distributed over the RAM of many nodes.

09) However, and this is an issue with Nalimov EGTs as well, the size of the indexing table needs to be included in the effective size of the EGT, and maybe it has been in your case already. Initialising the Nalimov EGTs seems to take a long time.

My tables include indexing tables. They are demand-mapped into RAM during table access, so initialisation is fast (just a check which files are present).

I will dig out my sub-6-man DTZ50 EGTs. It may be that the size I quoted was only for DTZ50 EGTs that are different from the DTZ EGTs and maybe the article does not make this clear.

I didn't find the number in the article, but I probably overlooked it.

I would be interested in using your WDL/DTZ (rather than your WDL50+/DTZ50+) EGTs at some stage if that were convenient - as I am more interested in an 'absolute truth' not moderated by FIDE's current k-move rule.

Absolute truth is included as well, since one can interpret cursed wins as wins and blessed losses as losses. WDL50+ is really a superset of WDL (and of WDL50). DTZ50+ is a superset of DTZ50, but not of DTZ (since DTZ50 paths may be longer than DTZ paths), but still provides a (cursed) winning path for cursed wins.

Regarding "current k-move rule", I think it is unlikely that FIDE will change the 50-move rule in the future. The arguments used for temporarily changing the rule in the past have now been ruled to be of insufficient importance, and I don't see any new arguments arising. The number 50 is arbitrary, but it has been there for a long time now. Anyway, regenerating the 6-piece tables for another value of k is only a question of days.

guyhaw · Post by **guyhaw** » Sun Apr 07, 2013 4:19 pm

I've responded in part via the Discussion Board's message-facility.

I can live with 'Cursed Loss'

Ken Thompson tried the 'outcounting' method - presumably used when trying to identify forced-loss positions - and also found it less than worthwhile. The reason for drawing on information from previously generated EGTs, WDLn or DTx, is of course to avoid wasting time trying to identify positions as lost when they are in fact drawn or won.

Interesting to see Jesper Torp Kristensen's ODBB thesis coming into this. I think there's a topic there worthy of the 'academic treatment'.

Ditto the comparison of your compression method and Andrew Kadatch's.

I agree that there is no strong argument for changing the 50-move rule based on endgame statistics - but FIDE cannot be relied on not to be capricious.

g

Arkansaw · Post by **Arkansaw** » Sat Apr 13, 2013 4:44 pm

This is an interesting development, would certainly like to try if the memory requirements can be reduced to 16G or even 8G

syzygy · Post by **syzygy** » Sun Apr 14, 2013 2:36 am

Arkansaw wrote:This is an interesting development, would certainly like to try if the memory requirements can be reduced to 16G or even 8G

16 GB is doable. I have just changed the indexing scheme during generation of pawnless tables from 10 x 64^5 to 462 x 10^4, so that generation of pawnless tables can now be done with 16 GB. (A nice bonus is that this speeds up the generation part by about 22%.) However, during compression another 6 GB are needed, so that you still need 24 GB.

My next step will be to keep only one of the wtm/btm tables in memory during compression and save the other to disk.

Pawnful tables use less memory since generation is done per file, but the same trick of saving to disk will be needed to stay under 16 GB.

This also means that a machine with 1 TB of RAM will already suffice for 7 pieces.

syzygy · Post by **syzygy** » Mon Apr 15, 2013 11:25 pm

syzygy wrote:My next step will be to keep only one of the wtm/btm tables in memory during compression and save the other to disk.

Pawnful tables use less memory since generation is done per file, but the same trick of saving to disk will be needed to stay under 16 GB.

This has now been done (by adding a "-d" option to rtbgen/rtbgenp), so a 16 GB system should be sufficient to generate all 6-piece tables.

koistinen · Post by **koistinen** » Tue Apr 23, 2013 12:27 pm

Nice, C and GNU Licence!

Have downloaded and will study it.

Post by **Kirill Kryukov** » Thu Sep 12, 2013 3:29 am

Here is a Windows 64-bit binary of Stockfish 4 with added syzygybases access capability. Compiled with MinGW GCC 4.8.1 from Cygwin64 distribution. Non-PGO build, "x86-64-modern" architecture (with popcnt). The source was obtained from here (link was posted by Ronald on CCC).

galen · Post by **galen** » Tue Feb 25, 2014 10:45 pm

Hi syzygy!

I've been trying out your generator. It's very nice. You clearly know the field quite well. (By contrast, I was largely winging it when making my generator.) I have one question for you, however, might be a real issue. Whenever I generate a table, I get output like this (e.g.):

find optimal permutation for wtm / dtz
[ 0] order: 1; perm: 5 6 10 14; 207017
[ 1] order: 1; perm: 5 10 6 14; 208067
[ 2] order: 1; perm: 5 14 6 10; 209654
[ 3] order: 1; perm: 6 5 10 14; 189392
[ 4] order: 1; perm: 10 5 6 14; 248574
[ 5] order: 1; perm: 14 5 6 10; 260910
[ 6] order: 1; perm: 6 10 5 14; 186062
[ 7] order: 1; perm: 6 14 5 10; 189904
[ 8] order: 1; perm: 10 6 5 14; 245151
[ 9] order: 1; perm: 14 6 5 10; 260391
[10] order: 1; perm: 10 14 5 6; 255693
[11] order: 1; perm: 14 10 5 6; 272621
[ 0] order: 1; perm: 6 5 14 10; 191556
[ 1] order: 1; perm: 6 10 14 5; 192820
[ 2] order: 1; perm: 6 14 10 5; 195180
best order: 1
best permutation: 14 5 10 6

The "best permutation" is often not even one of the permutations in the list, and, when it is, it's usually not the best one. I get this consistently both with and without pawns and for both wdl and dtz tables.

I doubt it's a platform (e.g., endianness) issue. I'm running a pretty vanilla Linux setup.

I looked through your code to find the cause of the discrepancy. I notice the tried perms are printed from type_perm_list[][], whereas the best is extracted from pt[] mapping over piece_perm_list[][]. These give different values. I don't fully understand the relationship between the arrays, but when I tried replacing the "best permutation:" loop body with printf(" %d", type_perm_list[*bestp]); (or without the "*" for the _wdl functions), it gave output that made sense at least to me. I also tried similarly changing the write into the best[] array; it didn't seem to break or even affect anything in my limited testing. I haven't traced how that array is used for compression.

I'm hoping this is just an output issue and that it's not selecting a worse permutation! Or maybe I'm just missing something?

Another, probably minor issue is that with (e.g.) KRRvK, num_types is 3, and that's how many elements printed out are in each tried permutation, but entry_piece.num is 4, and that's used for printing the best perm, so that an extra 0 gets tacked on the end. I'm not sure if there's risk of it going into uninitialized memory. The larger number is also used for writing into best[], for better or for worse. The extra number is not always a 0, by the way; for example, for KNNvKN, it's a 2.

syzygy · Post by **syzygy** » Wed Feb 26, 2014 12:20 am

galen wrote:The "best permutation" is often not even one of the permutations in the list, and, when it is, it's usually not the best one.

It's the same permutation, but encoded differently. I agree it's confusing, but it's how it is

.

I'm hoping this is just an output issue and that it's not selecting a worse permutation! Or maybe I'm just missing something?

Just output indeed.

Another, probably minor issue is that with (e.g.) KRRvK, num_types is 3, and that's how many elements printed out are in each tried permutation, but entry_piece.num is 4, and that's used for printing the best perm, so that an extra 0 gets tacked on the end.

There are 4 pieces, but just 3 types of pieces. The two Rs will always be together and in one of the two encodings they are treated as one.

CCRL Discussion Board

New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator