New 6-piece tablebase generator

syzygy · Post by **syzygy** » Mon Apr 01, 2013 10:09 pm

I have just released my tablebase generator for up to 6 pieces on github:
https://github.com/syzygy1/tb

It generates two sets of files:
- WDL files (extension: .rtbw) storing win/draw/loss information for access during search.
- DTZ files (extension: .rtbz) storing distance-to-zero information for access at the root.

In addition to win/draw/loss information, the WDL files also store whether the win or loss can be enforced within 50 moves.

The tables use custom compression. Total compressed size:

Code: Select all

                       WDL          DTZ
up to 5 pieces       378 MB        561 MB
up to 6 pieces       68.3 GB       81.9 GB

Generation is done (almost) completely in RAM and is fully multithreaded. For 6 pieces, it requires a system with at least 32 GB of RAM. On my system (6-core i3930K @ 4.2Ghz, 64 GB) all 6-piece tables were generated in less than 5 days. At least for now the generator requires Linux and gcc.

Of course at the moment there is just one engine supporting this format, and it is not publicly available

.

Post by **Kirill Kryukov** » Mon Apr 01, 2013 11:10 pm

Hi syzygy,

Congratulations on this release! Looks like a lot of work went into this. Hopefully this format will receive wide adoption.

What name should be used for your new tablebase format?

syzygy · Post by **syzygy** » Mon Apr 01, 2013 11:58 pm

Kirill Kryukov wrote:Congratulations on this release! Looks like a lot of work went into this.

Thanks! It was indeed quite a bit of work.

I started from the generator I wrote in 2005, but I rewrote practically everything. One of the triggers was the idea of WDL50+ and DTZ50+ that came up some years ago in discussions on this forum.

Hopefully this format will receive wide adoption.

Yes, that would be nice. I believe I have overcome most disadvantages of the Nalimov tables:
- much smaller tables (but still two-sided WDL tables);
- less overhead in probing;
- open source generator;
- much reduced generation time;
- unrestricted probing code;
- 50-move rule.

The main disadvantage (apart from RAM requirements during generation) compared to Nalimov is that the tables are not DTM, so that engines will not report "mate in 125 moves".

What name should be used for your new tablebase format?

Good question! I could not think of one, but did not want to delay the release

syzygy wrote:Of course at the moment there is just one engine supporting this format, and it is not publicly available.

This is actually not entirely correct. In interface/ there are some files which can simply be copied to the latest Stockfish source from github (only tbprobe.o needs to be added to OBJS in the Makefile). It's not fully tested, but it seems to work correctly. It reports mate in 60 if the position is 10 moves away from a tablebase win.

Btw, the file extensions are .rtbw (for WDL) and .rtbz (for DTZ). The 'r' stands for regular chess. I have also been working on generators for atomic, suicide and loser's chess. Those are not included at the moment.

guyhaw · Post by **guyhaw** » Fri Apr 05, 2013 9:30 pm

This is very interesting - and the filesizes quoted for the 3-5-man EGTs (561 MB) is amazing compared with the Nalimov figure of 3718 MB).

I must confess that I did wonder - as the announcement was made on April 1st - whether this was an April Fool's joke.

However, assuming it is not, may I ask if by 'DTZ' you actually mean 'DTZ' - or do you mean 'DTZ50' (where Nalimov's EGTs are 857 MB, much nearer to your 561 MB).

Are you using much larger blocksizes than '8kbytes' - Nalimov's choice as advised by Rob Hyatt?

My Nalimov figures can be found here ... http://centaur.reading.ac.uk/4524/1/200 ... eprint.pdf

g

syzygy · Post by **syzygy** » Sat Apr 06, 2013 12:49 am

guyhaw wrote:I must confess that I did wonder - as the announcement was made on April 1st - whether this was an April Fool's joke.

Maybe not the best day for an announcement, but the code is there.

However, assuming it is not, may I ask if by 'DTZ' you actually mean 'DTZ' - or do you mean 'DTZ50' (where Nalimov's EGTs are 857 MB, much nearer to your 561 MB).

To be precise, I mean DTZ50+ as discussed here some years ago. For positions that win or lose under the 50-move rule this is DTZ50. For positions that are won or lost but for the 50-move rule this is the distance to either a won/lost position 50 moves away from a zeroing move, or to a zeroing move leading to a position that is won or lost but for the 50-move rule.

My DTZ50+ format depends on the WDL50+ tables being available. In the DTZ50+ table I store draws as "don't cares" (i.e. I pick a value that I expect to improve compression) and I don't store the "sign" of DTZ50+ values (i.e. whether the position wins or loses). In addition I only store the smallest of the "white-to-move" and "black-to-move" tables. So it's not straightforward to compare. Of course in the end what matters is the combination of size and access time, since all tablebases can be "compressed" to the size of a tablebase generator...

Are you sure about that 857 MB? 3718 MB for "full" DTZ (i.e. without throwing away information that is already in WDL and for 2-sided tables) sounds about right, though maybe a bit on the high side. Going from DTZ to DTZ50 is not going to reduce 3718 MB to 857 MB.

Are you using much larger blocksizes than '8kbytes' - Nalimov's choice as advised by Rob Hyatt?

For WDL I use 64 bytes/block for most tables. For DTZ I use 1024 bytes/block for most tables. (Since in my format each block contains at most 65536 positions, for the tables that compress well I use smaller blocks.) This allows me to cache blocks in RAM in compressed form (I actually leave the caching to the OS).

guyhaw · Post by **guyhaw** » Sat Apr 06, 2013 7:22 am

Thank you for your informative reply.

It is really good news that I can set aside my 'April fool' doubts as I hoped. I had just been through a sceptical phase after reading Chessbase's neat April Fool story about FIDE raising the K-factor of the ELO system to 60 (which, at the time of writing, is not true).

It is some time since I have been visiting this Discussion Board regularly, and I tended not to take much interest in items which majored on the 50-move rule as one cannot rely on FIDE not changing '50' to '60' or '40': they had tried to accommodate deeper endgames in the past. If they were to go for a round number that is higher than all known endgames, it would be around 600 today.

However, it is clear that you are combining a number of ideas that are newish or at least 'revivals' for me, so I attempt some sort of 'reprise' and review here for at least my benefit:

01) You are using value-only 'WDL-style' EGTs. I had advocated the creation and use of these and indeed, the use of WDL EGTs in the creation of DTx EGTs as this avoids examining potential forced-losses which are in fact draws.

02) The values-only EGT is to be used either on its own or in conjunction with a DTx EGT giving depth information. Stefan Meyer-Kahlen has introduced Shredder_Bases for sub-6-man (but unfortunately not sub-7-man) chess and users of SHREDDER get the benefit there as there is no need to 'Nalimov' some positions. Eugene Nalimov must know he has become part of the fabric of chess when his family name has become a verb!

03) Because your provide a values-only EGT, there is no need to provide for the DTx EGT being used on its own. Therefore, the DTx EGT does not have to carry information which can be inferred from the values-only EGT. Thus, you do not need to indicate whether a position is a win, draw or loss, removing the need for (a) a 'draw' value, and (b) a 'sign' on depth to indicate stm win or stm loss.

04) You are extending the idea of 'WDL' EGTs to 'WDL+' EGTs (or strictly, to a WDL50+ EGT) where I think you have five values, WIn/Draw/Loss (under the 50-move rule) plus:
- C ... 'Cursed Win', i.e. would be a Win without the 50-move rule but is not a win under the 50-move rule
- B ... Blessed Loss', i.e. would be a Loss without the 50-move rule but is not a Loss under the 50-move rule

05) It follows that before one can create a WDL50+/DTZ50+ pair of EGTs, one has to create a WDL/DTZ pair of EGTs.

06) As per point 03, the accompanying 'DTZ50+' EGT need not carry information which can be inferred from the WDL50+ EGT. Thus, a '7' in the DTZ50+ can mean, as indicated in the WDL50+ EGT:
- 7 winner's moves to either a DTZ50-win or a DTZ50-loss in the next phase of play (after a Pawn-move and/or capture),
- a draw (ignoring the 50-move rule) ... where '7' just happens to be the most convenient value to put there from compression considerations,
- (if WDL50+ has a 'C' or a 'B' against this position) a win/loss in absolute terms, but 7-moves from 'something' given the 50-move rule

I need to think more about what this 'something' is. I have a feeling there is some 'information loss' here but cannot resolve this entirely yet.

07) You are saving at least half the size of the EGT by only keeping the smaller stm-EGT, at the cost of requiring the chess-engine to do an extra one-ply search. Thus, if the EGT were remote, e.g. on the web, this would multiply the communication cost by 'm' where 'm' could be quite large. Ken Thompson did something similar I think with his original DTC EGTs.

08) You are using a compression-scheme which is not that of Andrew Kadatch as in the Nalimov EGTs. You say this is based on ideas of Larsson and Moffat, http://www.larsson.dogma.net/dcc99.pdf, and has Huffman-coding as a follow-up to iteratively creating an alphabet of symbols. I guess this proves to be better than Kadatch's scheme. Compression/decompression-time seems not to be an important parameter compared to disc-I/O time. I think the recent MVL 7-man initiative uses LZW-compression but they have not expounded on that much yet.

09) However, and this is an issue with Nalimov EGTs as well, the size of the indexing table needs to be included in the effective size of the EGT, and maybe it has been in your case already. Initialising the Nalimov EGTs seems to take a long time.

10) As you say, it is the combination of space and time that determines the utility of one's EGT. I like your observation that 'all EGTs can be compressed to the size of the code that generates them'

I will dig out my sub-6-man DTZ50 EGTs. It may be that the size I quoted was only for DTZ50 EGTs that are different from the DTZ EGTs and maybe the article does not make this clear.

Congratulations on your achievements: this is certainly an original approach bringing new ideas to the 'state of the art'.

I would be interested in using your WDL/DTZ (rather than your WDL50+/DTZ50+) EGTs at some stage if that were convenient - as I am more interested in an 'absolute truth' not moderated by FIDE's current k-move rule.

I also have several large sets of sub-6-man positions, sorted by endgame, which I can share somehow if you care to evaluate them in DTZ terms for me.

Best wishes ... g

syzygy · Post by **syzygy** » Sat Apr 06, 2013 6:07 pm

guyhaw wrote:01) You are using value-only 'WDL-style' EGTs. I had advocated the creation and use of these and indeed, the use of WDL EGTs in the creation of DTx EGTs as this avoids examining potential forced-losses which are in fact draws.

I create both WDL and DTZ in one go, so I don't use WDL in the creation of DTZ. The algorithm used is the grandfather algorithm with 2 plies per iteration (I think HGM calls this leapfrogging, but I might be wrong). I tried the outcounting method, but it didn't seem to be competitive (and it makes things more complicated).

04) You are extending the idea of 'WDL' EGTs to 'WDL+' EGTs (or strictly, to a WDL50+ EGT) where I think you have five values, WIn/Draw/Loss (under the 50-move rule) plus:
- C ... 'Cursed Win', i.e. would be a Win without the 50-move rule but is not a win under the 50-move rule
- B ... Blessed Loss', i.e. would be a Loss without the 50-move rule but is not a Loss under the 50-move rule

I believe the terminology is from kronsteen (on this forum). My generator calls them both "cursed" in the statistics that are output, because I forgot about the "blessed loss" term until recently.

05) It follows that before one can create a WDL50+/DTZ50+ pair of EGTs, one has to create a WDL/DTZ pair of EGTs.

A pure WDL/DTZ pair is not of much use for creating WDL50+/DTZ50+. I create tables in RAM that have all the information necessary for WDL50+ and DTZ50+, then permute them to different indexing schemes and compress. I do test runs on subsets of the data to find good permutations. (The idea to try permutations is from Jesper Torp Kristensen's master thesis.)

06) As per point 03, the accompanying 'DTZ50+' EGT need not carry information which can be inferred from the WDL50+ EGT. Thus, a '7' in the DTZ50+ can mean, as indicated in the WDL50+ EGT:
- 7 winner's moves to either a DTZ50-win or a DTZ50-loss in the next phase of play (after a Pawn-move and/or capture),
- a draw (ignoring the 50-move rule) ... where '7' just happens to be the most convenient value to put there from compression considerations,
- (if WDL50+ has a 'C' or a 'B' against this position) a win/loss in absolute terms, but 7-moves from 'something' given the 50-move rule

Exactly. And I go one step further by sorting each of the 4 ranges by frequency and remapping. So a '7' in the DTZ50+ would correspond to say the 7th most frequent win-in-x or loss-in-x or cursed-win-in-x or blessed-loss-in-x value. This further improves compression.

07) You are saving at least half the size of the EGT by only keeping the smaller stm-EGT, at the cost of requiring the chess-engine to do an extra one-ply search. Thus, if the EGT were remote, e.g. on the web, this would multiply the communication cost by 'm' where 'm' could be quite large.

No, the client just sends a position and the server-side probing code performs the 1-ply search.

Of course the 1-ply search creates probing overhead, but the idea is to let an engine only probe the DTZ tables at the root anyway.

08) You are using a compression-scheme which is not that of Andrew Kadatch as in the Nalimov EGTs. You say this is based on ideas of Larsson and Moffat, http://www.larsson.dogma.net/dcc99.pdf, and has Huffman-coding as a follow-up to iteratively creating an alphabet of symbols.

Correct. It's a variation on the RE-PAIR scheme.

I guess this proves to be better than Kadatch's scheme.

For sure it is quite different. I think Kadatch's scheme depends on maintaining a cache of decompressed blocks, which I wanted to avoid.

Compression/decompression-time seems not to be an important parameter compared to disc-I/O time. I think the recent MVL 7-man initiative uses LZW-compression but they have not expounded on that much yet.

To improve access times they have gone from a block size of 2 MB to (I thought) 8 KB at the cost of a considerable increase in total compressed size, so decompression time was a limiting factor for them. To go smaller than 8 KB without losing too much compression efficiency a compression method is needed that uses a fixed dictionary (per table) that may be constructed on the basis of the whole table ("offline") instead of a dictionary that is constructed incrementally while decompressing a block ("online").

The compression ratio of my scheme is independent of block size, except that splitting into blocks at byte-boundaries necessarily loses a few bits at the end of each block.

The main disadvantage of my compression scheme is that compression is not very fast and that the whole table needs to be in RAM, as it sweeps through the table many times. The good thing though is that it can be parallelised very well and now that I think of it, it could be implemented in a distributed manner e.g. on Lomonosov-type systems where the table is distributed over the RAM of many nodes.

09) However, and this is an issue with Nalimov EGTs as well, the size of the indexing table needs to be included in the effective size of the EGT, and maybe it has been in your case already. Initialising the Nalimov EGTs seems to take a long time.

My tables include indexing tables. They are demand-mapped into RAM during table access, so initialisation is fast (just a check which files are present).

I will dig out my sub-6-man DTZ50 EGTs. It may be that the size I quoted was only for DTZ50 EGTs that are different from the DTZ EGTs and maybe the article does not make this clear.

I didn't find the number in the article, but I probably overlooked it.

I would be interested in using your WDL/DTZ (rather than your WDL50+/DTZ50+) EGTs at some stage if that were convenient - as I am more interested in an 'absolute truth' not moderated by FIDE's current k-move rule.

Absolute truth is included as well, since one can interpret cursed wins as wins and blessed losses as losses. WDL50+ is really a superset of WDL (and of WDL50). DTZ50+ is a superset of DTZ50, but not of DTZ (since DTZ50 paths may be longer than DTZ paths), but still provides a (cursed) winning path for cursed wins.

Regarding "current k-move rule", I think it is unlikely that FIDE will change the 50-move rule in the future. The arguments used for temporarily changing the rule in the past have now been ruled to be of insufficient importance, and I don't see any new arguments arising. The number 50 is arbitrary, but it has been there for a long time now. Anyway, regenerating the 6-piece tables for another value of k is only a question of days.

guyhaw · Post by **guyhaw** » Sun Apr 07, 2013 4:19 pm

I've responded in part via the Discussion Board's message-facility.

I can live with 'Cursed Loss'

Ken Thompson tried the 'outcounting' method - presumably used when trying to identify forced-loss positions - and also found it less than worthwhile. The reason for drawing on information from previously generated EGTs, WDLn or DTx, is of course to avoid wasting time trying to identify positions as lost when they are in fact drawn or won.

Interesting to see Jesper Torp Kristensen's ODBB thesis coming into this. I think there's a topic there worthy of the 'academic treatment'.

Ditto the comparison of your compression method and Andrew Kadatch's.

I agree that there is no strong argument for changing the 50-move rule based on endgame statistics - but FIDE cannot be relied on not to be capricious.

g

Arkansaw · Post by **Arkansaw** » Sat Apr 13, 2013 4:44 pm

This is an interesting development, would certainly like to try if the memory requirements can be reduced to 16G or even 8G

syzygy · Post by **syzygy** » Sun Apr 14, 2013 2:36 am

Arkansaw wrote:This is an interesting development, would certainly like to try if the memory requirements can be reduced to 16G or even 8G

16 GB is doable. I have just changed the indexing scheme during generation of pawnless tables from 10 x 64^5 to 462 x 10^4, so that generation of pawnless tables can now be done with 16 GB. (A nice bonus is that this speeds up the generation part by about 22%.) However, during compression another 6 GB are needed, so that you still need 24 GB.

My next step will be to keep only one of the wtm/btm tables in memory during compression and save the other to disk.

Pawnful tables use less memory since generation is done per file, but the same trick of saving to disk will be needed to stay under 16 GB.

This also means that a machine with 1 TB of RAM will already suffice for 7 pieces.

syzygy · Post by **syzygy** » Mon Apr 15, 2013 11:25 pm

syzygy wrote:My next step will be to keep only one of the wtm/btm tables in memory during compression and save the other to disk.

Pawnful tables use less memory since generation is done per file, but the same trick of saving to disk will be needed to stay under 16 GB.

This has now been done (by adding a "-d" option to rtbgen/rtbgenp), so a 16 GB system should be sufficient to generate all 6-piece tables.

koistinen · Post by **koistinen** » Tue Apr 23, 2013 12:27 pm

Nice, C and GNU Licence!

Have downloaded and will study it.

Post by **Kirill Kryukov** » Thu Sep 12, 2013 3:29 am

Here is a Windows 64-bit binary of Stockfish 4 with added syzygybases access capability. Compiled with MinGW GCC 4.8.1 from Cygwin64 distribution. Non-PGO build, "x86-64-modern" architecture (with popcnt). The source was obtained from here (link was posted by Ronald on CCC).

galen · Post by **galen** » Tue Feb 25, 2014 10:45 pm

Hi syzygy!

I've been trying out your generator. It's very nice. You clearly know the field quite well. (By contrast, I was largely winging it when making my generator.) I have one question for you, however, might be a real issue. Whenever I generate a table, I get output like this (e.g.):

find optimal permutation for wtm / dtz
[ 0] order: 1; perm: 5 6 10 14; 207017
[ 1] order: 1; perm: 5 10 6 14; 208067
[ 2] order: 1; perm: 5 14 6 10; 209654
[ 3] order: 1; perm: 6 5 10 14; 189392
[ 4] order: 1; perm: 10 5 6 14; 248574
[ 5] order: 1; perm: 14 5 6 10; 260910
[ 6] order: 1; perm: 6 10 5 14; 186062
[ 7] order: 1; perm: 6 14 5 10; 189904
[ 8] order: 1; perm: 10 6 5 14; 245151
[ 9] order: 1; perm: 14 6 5 10; 260391
[10] order: 1; perm: 10 14 5 6; 255693
[11] order: 1; perm: 14 10 5 6; 272621
[ 0] order: 1; perm: 6 5 14 10; 191556
[ 1] order: 1; perm: 6 10 14 5; 192820
[ 2] order: 1; perm: 6 14 10 5; 195180
best order: 1
best permutation: 14 5 10 6

The "best permutation" is often not even one of the permutations in the list, and, when it is, it's usually not the best one. I get this consistently both with and without pawns and for both wdl and dtz tables.

I doubt it's a platform (e.g., endianness) issue. I'm running a pretty vanilla Linux setup.

I looked through your code to find the cause of the discrepancy. I notice the tried perms are printed from type_perm_list[][], whereas the best is extracted from pt[] mapping over piece_perm_list[][]. These give different values. I don't fully understand the relationship between the arrays, but when I tried replacing the "best permutation:" loop body with printf(" %d", type_perm_list[*bestp]); (or without the "*" for the _wdl functions), it gave output that made sense at least to me. I also tried similarly changing the write into the best[] array; it didn't seem to break or even affect anything in my limited testing. I haven't traced how that array is used for compression.

I'm hoping this is just an output issue and that it's not selecting a worse permutation! Or maybe I'm just missing something?

Another, probably minor issue is that with (e.g.) KRRvK, num_types is 3, and that's how many elements printed out are in each tried permutation, but entry_piece.num is 4, and that's used for printing the best perm, so that an extra 0 gets tacked on the end. I'm not sure if there's risk of it going into uninitialized memory. The larger number is also used for writing into best[], for better or for worse. The extra number is not always a 0, by the way; for example, for KNNvKN, it's a 2.

syzygy · Post by **syzygy** » Wed Feb 26, 2014 12:20 am

galen wrote:The "best permutation" is often not even one of the permutations in the list, and, when it is, it's usually not the best one.

It's the same permutation, but encoded differently. I agree it's confusing, but it's how it is

.

I'm hoping this is just an output issue and that it's not selecting a worse permutation! Or maybe I'm just missing something?

Just output indeed.

Another, probably minor issue is that with (e.g.) KRRvK, num_types is 3, and that's how many elements printed out are in each tried permutation, but entry_piece.num is 4, and that's used for printing the best perm, so that an extra 0 gets tacked on the end.

There are 4 pieces, but just 3 types of pieces. The two Rs will always be together and in one of the two encodings they are treated as one.

CCRL Discussion Board

New 6-piece tablebase generator

New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator

Re: New 6-piece tablebase generator