EGTB generation

Codeman · Post by **Codeman** » Fri Oct 19, 2007 8:31 pm

Hallo!

The last couple of weeks I was getting interested into EGTBs and I would like to participate in their generation.

I was as well thinking about small things, which could still be improved:

A lot of space is still wasted by broken position, where stm is in check.
As far as I know Nalimov only removes positions, where the king is in check and no other piece can be put in between.

It should be possible to remove all these broken positions. My idea was it to, when indexing positions, first only number positions, where the sntm King cant be checked. And in a second part of the file deal with the remaining positions.

e.g.: KRK:
In the first part I have all positions, where the rook isn't on the same file or row as the other king. (leaving 7*7=49 fields) So the first part would be 49*161 Positions.
The second part would only look at positions, where the Rook is on the same file or row, but only with the own king in between.

This was easy for a 21 Endgame, but is as well possible for bigger endgames.
e.g.: KRBK:
First all positions, where either R or B could check the other King
Then all positions, where R could check the King (always with B, K or B & K inbetween)
Then all positions, where B could check the King (always with R, K or R & K inbetween)
Then all positions, where B and R check the King (in this case 0, because there are not enough pieces to block the 2 attacks)

----

Secondly I would put much more efforts into Bitbase developement. I can see quite a high potential there. So I would stress on even building 6-Piece BB or bigger. Here I have thought of some very efficient ways of compressing the files.

----

I am capable of coding in C++ and Assembler. Which I want to use to create some nice generator.

----

What do you think? Any comments/questions/etc. ?

Codeman

ath · Post by **ath** » Sat Oct 20, 2007 2:49 pm

>A lot of space is still wasted by broken position, where stm is in check.

Is this important? Disk space is very cheap today. Add to that that irregular structures in code
tend to be expensive in the end: writing and debugging the structures used in the Edwards
databases was probably far less expensive that for the Nalimov tables, and could be done with
considerably higher degree of confidence that the result was correct. You don't want
any errors in these databases.

If space *is* expensive, compress. But the more 'useless' information you remove before compression
(a la Nalimov), the trickier it is going to be to do a good job later. I strongly suspect it is
better to keep the 'broken positions' (a la Edwards), and then use a compression scheme tailored
to that particular representation. A general purpose, single-pass, compression scheme will probably
not be up to the job. But a multi-pass scheme to build an initial dictionary, which is then used in
the final compression pass would probably do fairly well.

Bitbases ... well, it's a question of what audience you cater to. My interest is almost entirely
in endgame studies, and so bitbases do not really do the job I want.

Codeman · Post by **Codeman** » Sun Oct 21, 2007 7:21 am

Considering the last developments of EGTB generations and the space requirements:

(in MB)
3 0,063476563
4 30
5 16977,92
6 1048576
7 ?

You can see that the amount of space required is increasing exponecially
I looked at it using a logarithmic scale and I would estimate 7 piece endgames to take around 20 to 30 TB

So in my opinion space is all that matters in this subject. If we want to explore > 6 Endgames we have to be talking about introducing new means of optimisation.
My Idea doesn't do much in sence of a exponencial growth but I would still expect 25% compression for most endgames with Queens on STM

Your second point regarding bitbases:
I agree normal tbs are much more helpful in terms of studies, but as explained, there is a lack of space and so I see high potential for BBs. Probably in terms of space even 8 - 9 piece endgames would be possible. Maybe more, if compression is optimised (especially BBs are easier to compress than TBs)

Furthermore, computers are getting stronger and stronger in this day and age, so they can probably already now overcome the downside of BBs. Bitbases would increase the horizon of current Engines extremely. So instead of a TB telling you, how many more moves there are till the next conversion/mate/whatever Engines could calculate this using BBs.
Lets say, you are on a position which marks win and you want to find the correct way to transpose to an easier endgame. The engine could then (in most cases) in its search on each part of the search-tree discard almost all other moves but the winning moves.

And If you are interested in eg. longest mates possible, etc. This as well is no problem, as informations like this could be optained anyway in the course of generation of the endgame.

Arpad Rusz · Post by **Arpad Rusz** » Sun Oct 21, 2007 6:55 pm

I would estimate the total size of the 7-man tablebases as much bigger: 240TB rather than 20-30TB as in the previous post. Also the correct size of the 5-man tablebases is 6-7GB (not 16.9GB).

4-man ~30MB
5-man ~6GB (4-man x 200)
6-man ~1.2TB (5-man x 200)
so
7-man ~240TB (6-man x 200).

Improving the database structure (e.g. with DTC instead of DTM) this size probable can be reduced (lets say to half of it), but that is still huge(120TB). But the total size of the 7-man bitbases probable would be around 1TB just perfect for another project to download them with emule!
But who will be the next Nalimov?

Codeman · Post by **Codeman** » Mon Oct 22, 2007 6:29 pm

I now started creating my own little generator. Could you please tell me, what is the fastest generation-method for bitbases?

I was thinking of a generic algorythm:
1) Generate all mate-positions (mark as win in TB)
2) From each position check white moves that get to 1)
3) From those poitions check black moves that get to 2) and if black has no moves that lead to positions other than 2) mark as win for white
... etc.

I have already created quite a fast generator for mate-positions in Assembler and would now like to start making the main tb-generation
Do you know of any better (faster / using less ram) way of dealing with this problem, than explained above?

guyhaw · Post by **guyhaw** » Mon Oct 22, 2007 10:16 pm

I don't want to discourage 'Codeman' but tackling bitboards with Assembler, and trying to trump Nalimov's index-scheme with something even more complex, sounds like the height of ambition to me. I don't think the Konoval/Bourzutschky approach (admittedly in PENTIUM Assembler) tries to avoid 'broken' positions, even as much as Nalimov. They have considered mapping a large-index generated EGT, after generation, into a smaller-index EGT for storage and use, but I don't think Marc B has done that - except for generating some 6-man EGTs with FEG and then converting them to Nalimov form.
Codeman seems to be 'special pleading' with KRK - and even that special pleading wasn't that convincing. A better way to reduce size is to use a non-DTM metric, especially DTZ, Depth To (move-count) Zeroing (move). These EGTs are so far about half the size of DTM EGTs because they compress better.
As for size, 6-man weigh in at 1.5TB, so 7-man have to weigh in at 90TB times the ratio of 7-man to 6-man endgame profiles.
Guy

Arpad Rusz · Post by **Arpad Rusz** » Mon Oct 22, 2007 11:15 pm

Sorry Guy,

I still think that the total size of the 6-man is only 1.2TB(5+1 included) and not 1.5TB. Your number was only a guess before the generation process of the tablebases ended.
I didn't really understand your last sentence - maybe my English is to be blamed. So what would be the size of the 7-man tablebases in your opinion?

With best wishes,
Arpad

Codeman · Post by **Codeman** » Tue Oct 23, 2007 11:25 pm

Arpad Rusz, you were right, I was taking my figures from wrong sources.
I havent got figures for 6men (incl. 51) but now I am taking your 1,2 TB

In MB
3 0,06 ( x 439,79 )
4 26,8 ( x 268,99 )
5 7208,96 ( x 170,45 )
6 1.228.800 ( x 112,05 [personal estimation following the trend])
7 137.691.970

---

guyhaw, thank you for your critical comments. But concerning my statement about the improvement of index-scheme, I am still convinced of the potencial. And the point about the different index-systems (DTM/DTC/DTZ) is rather relevant for normal Tablebases, not for Bitbases. But I really have to think about your advice of first creating the TBs in Assembler using a large-scale index-scheme and then converting to a smaller one, thanks.

Another thing to consider: Do you know from any other Bitbases, whether they take into account the 50-move rule? I think leaving it away would probably increase the speed and decrease the space-useage of the Generation-process, as I wouldn't have to work the endgame through interation by iteration and could do it more chaotically.

h.g.muller · Post by **h.g.muller** » Thu Oct 25, 2007 9:13 am

I have done some preliminary tests in EGTB generation, which led me to the following conclusions.

1) The overwhelmingly important property of the indexing scheme is simplicity, no compactness. The Nalimov scheme totally sucks in this respect: for trying to save 10% of space, you pay with a factor 20-50 slowdown of the generation process. Never use it.
Always use a trivial scheme, where the index is simply the 6-bit square numbers concatenated, or at least where it is a linear combination of the square numbers of the pieces (for non-8x8 boards).

2) Never convert position to index or index to position during generation. This is an expensive step that can be totally eliminated. Always derive index from index and position from position differentially. Move generation on the index is just as easy as move generation on a mailbox board. In fact an EGTB is nothing but a multi-dimensional mailboax board.

3) With an efficient algorithm, there is no benifit in the use of assembler at all. The generation (apart from disk I/O) is 100% waiting for memory access; all computation is done in parallel, and typically only uses 5% of the time, in parallel with the fetching of the next EGTB entry from memory. Speeding this up 5 times by the use of assembler only means the CPU is stalled waiting for memory 99% of the time, in stead of 95% of the time. (I could add that on most modern CPUs compiled C code cannot be accelerated by handcoding in assembler, unless you really want to make use of vector instructions such as SSE, which the compiler does not generate.)

4) Access order is all important, As the generation time is totally dominated by access time, both at the Disk-I/O and the DRAM-access level, factor 5 to 10 can be gained (and lost!) by probing entries in different order, so that they need to be brought in cache again, while the same probe could have been done before while it was in cache.

5) Of course end-games with Pawns should be built P-slice by P-slice, as logically each P-slice is just as much a different end-game as KBNKR is different from KBKR (i.e. one can convert to the other through an irreversible move, so for the generation of one, you need the other, but not the other way around.

I will start implementing all these principles in my next EGTB generator, now that I am (nearly) done with improving Winboard and creating Fairy-Max. I will most likely give this priority over writing an evaluation function for Joker.

Post by **Kirill Kryukov** » Tue Oct 30, 2007 8:40 am

h.g.muller wrote:I will start implementing all these principles in my next EGTB generator, now that I am (nearly) done with improving Winboard and creating Fairy-Max. I will most likely give this priority over writing an evaluation function for Joker.

Hi H.G.! Thanks for your efforts, I am sure many will appreciate! Please keep posting when you have any news! Best, Kirill.

ath · Post by **ath** » Wed Oct 31, 2007 4:52 pm

h.g.muller wrote:4) Access order is all important, As the generation time is totally dominated by access time, both at the Disk-I/O and the DRAM-access level, factor 5 to 10 can be gained (and lost!) by probing entries in different order, so that they need to be brought in cache again, while the same probe could have been done before while it was in cache.

It should be noted that that particular problem is related to storage technology, and can be overcome by selecting a different type of memory.
Today, solid-state disks probably aren't an economical alternative to EGTB generation at home, but in a perhaps two years it very probably
will be the technology to use.

h.g.muller · Post by **h.g.muller** » Thu Nov 01, 2007 11:33 am

That will not alter a thing: access to that storage will still be many orders of magnitude slower than access to the CPU caches. The process of optimizing the speed will thus remain the problem of minimizing the number of required accesses, which will totally overwhelm all other considerations.

jkominek · Post by **jkominek** » Wed Dec 12, 2007 5:52 pm

h.g.muller wrote:I have done some preliminary tests in EGTB generation, which led me to the following conclusions.

Always use a trivial scheme, where the index is simply the 6-bit square numbers concatenated, or at least where it is a linear combination of the square numbers of the pieces (for non-8x8 boards).

2) Never convert position to index or index to position during generation. This is an expensive step that can be totally eliminated.

That's two boldfaced unconditionals in a row with nary a consideration of the obvious. How do you propose to fit 7-man tablebases into memory using "6-bit square numbers concatenated" indexing? You have a supercomputer hiding in your basement?

In defense of Nalimov and his highly complicated indexing code, what he accomplished was to make the construction of 6-man tables a viable undertaking on machines available five years ago. He traded off speed for feasibility, and then he did the work. The endgame community has benefited enormously from this choice.

By all accounts your program enjoys, in its computations, the fastest most efficient use of memory sequencing and cache locality ever seen. And that is certainly a fine trick. But it is the least efficient with regards to the slowest resource of all: the computing industry's ability to make ever-bigger machines affordable. For which your strategy, sadly, is caught in a huge IO stall.

john

jkominek · Post by **jkominek** » Wed Dec 12, 2007 7:51 pm

Codeman wrote: guyhaw, thank you for your critical comments. But concerning my statement about the improvement of index-scheme, I am still convinced of the potencial.

Please bear in mind that it is space after compression, not before, that is to be minimized. The percentage of broken positions is related to this, though loosely. The degree of impact can only truly be measured empirically, and I bet it won't be all that much. There are easier gains to be had (on the order of 5-10%) simply by increasing the block size to 64K during compression.

By the way, it is possible to remove all broken positions, and that is rather easy to accomplish. What you do is this. For a given board configuration, generate all the positions in a fixed sequential order, skipping broken entries. The tablebase values are written in this order. To probe, regenerate the positions sequentially from the beginning, stopping when you reach the one you want. Read in that tablebase location. In essence, the random access file has been converted to a linear access tape. (Remember those spinning reels on 60s era mainframes?)

What I've described represents one extreme end of the speed-versus-size curve (pre-compression). The direct array-indexing of Muller lies at the opposite end. What you imagine, clearly, is some algorithmic indexing mechanism that has fewer broken positions than Nalimov. Likely such a function does exist, but good luck to you in finding it. That's going to be a tough hill to climb.

john

jkominek · Post by **jkominek** » Thu Dec 13, 2007 1:22 am

jkominek wrote: Please bear in mind that it is space after compression, not before, that is to be minimized. The percentage of broken positions is related to this, though loosely. The degree of impact can only truly be measured empirically, and I bet it won't be all that much. There are easier gains to be had (on the order of 5-10%) simply by increasing the block size to 64K during compression.

Adding some hard data to this issue, I've run a quick compression test on all the 3, 4, and 5-man pawnless tables, starting from two conditions: mine and Nalivov. To ensure identical conditions I reapplied datacomp to uncompressed Nalimov tables.

Code: Select all

     Ratio  Percent     Size      Tablebase Condition 
 1   1.0000 100.0000    8.390 GiB  Kominek uncomp
 2   5.2082  19.2006    1.611 GiB  Kominek bzip-9
 3   4.8303  20.7028    1.737 GiB  Kominek comp8k
 4   5.1978  19.2390    1.614 GiB  Kominek comp64k
 5   1.1092  90.1577    7.564 GiB  Nalimov uncomp
 6   4.8302  20.7032    1.737 GiB  Nalimov comp8k
 7   5.1753  19.3225    1.621 GiB  Nalimov comp64k

Bottom line ... Nalimov's complicated indexing scheme gives it a 10% head start, but after compression the two are neck-and-neck. That's information theory for you. Increasing the block size from 8k to 64k provides an additional 9% benefit.

Truth be told, the best way to "improve" on Nalimov indexing is to undo it!

john

h.g.muller · Post by **h.g.muller** » Thu Dec 13, 2007 12:01 pm

I think that 7-men EGTBs woul not fit in memory anyway, whether you would use trivial (multi-dimensional array) or Nalimov indexing. Squeezing out broken state just doesn't buy you large factors of saving, a few dozen percents won't make the difference.

Note that the multi-dimensional-array-like indexing I use is fully compatible with using 8-fold board symmetry to reduce the memory requirement by nearly a factor 8. The indexing pertains to VIRTUAL memory. If you leave part of that array unused, e.g. those with the white King outside the canonical triangle, the corresponding pages will never make a demand on PHYSICAL memory.

My plan for building (Pawnless) 7-men EGTBs is to expand the 5-men block that is the current working set into a trivially indexed 1GB data-set in DRAM. These chunks will then have to be streamed in and out of DRAM, most likely from disk, where each chunk would be divided over 4K 'chuncks' of 256KB each, which have to be gathered from and scattered to the principal storage device. These 256KB chuncks can be independently compressed before committing them to the main storage. (Most likely to hard-disk; I don't have much hope that they can be compressed enough to all reside in DRAM, as a Pawnless 7-men will consist of 8M such 256KB chunks. For 6-men it seems very feasible to hold them in DRAM, though.)

Pawny end-games are of course much less demanding, as you treat them P-slice by P-slice.

CCRL Discussion Board

EGTB generation

EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation

Re: EGTB generation