Speed-comparison

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
Codeman
Posts: 85
Joined: Fri Oct 19, 2007 7:50 pm

Speed-comparison

Post by Codeman »

Hi!
I am currently working on a Bitbase generator. And today I have successfully generated my first 5-men Bitbase ("KBNKB") All in all it took my 3,06GH processor 343 seconds. Now I wanted to know, how this performance is compared to other generators.

I was as well thinking of making this Generator publically available, as soon as 6 men is possible too. I hoped some of you would be so kind and spare some processing power.

Edmund
h.g.muller
Posts: 223
Joined: Mon Feb 19, 2007 8:24 am
Sign-up code: 0
Location: Amsterdam
Contact:

Re: Speed-comparison

Post by h.g.muller »

I have never generted bitbases, but I did make a DTM / DTC tablebase builder, and I suppose the speed difference would not be very large. IIRC it did take about 120 sec to do KBNKB on a 1.3GHz Pentium M. (This is a generaly drawn endgame, isn't it? Those typcally took less time than generally won end-games. I think KBBKN took about 180 sec.)

Unfortunately, all that data got lost when my laptop suffered a hard-disk crash. The EGTB builder source code is still on my website, though, although it is ot the fastest version I ever had. (This difference was only noticible in end-games that took very many cycles, like KBBKN, where most wins take about 80 moves.) I am working now on a new builder that should still be ~ 10 times faster, though, so I will make no effort to recover the previous best version. It takes 3.4 ns to read a byte from DRAM on my hardware, (220ns per cache line of 64 bytes, actually), and a Pawnless 5-men is about 128MB. So a cycle through the EGTB will take some 0.45 sec, and most end-games require only 20-30 dense cycles (the wins in other cycles being so sparse, that you can skip most cache lines, strongly reducing the cycle time). So that would make 10-15 sec a reasonable time for a Pawnless 5-men build.
User avatar
ZeroOne
Posts: 20
Joined: Sat Apr 28, 2007 10:53 pm
Sign-up code: 0
Location: Finland

Re: Speed-comparison

Post by ZeroOne »

What engines, if any, can utilize your bitbases? I hope you design the program so that it can generate bitbases of arbitrary size, not just limited to 6-men... Would that be possible? I think we'd have many 7-man and perhaps even 8-man tablebases if only the Nalimov format wasn't limited to 6-men. On the other hand I don't remember reading about 6-man bitbases before. I've only got 5-man Shredderbases and Scorpio bitbases (used by Toga).
Communication usually fails, except by accident.
Codeman
Posts: 85
Joined: Fri Oct 19, 2007 7:50 pm

Re: Speed-comparison

Post by Codeman »

h.g.muller, thank you for your stats.Seems, that I still have to improve some parts of my algorithm. Actually I thought Bitbases should be a bit faster in generation, as the move-number doesn't matter.
My algorithm is currently looking something like that:

Code: Select all

Generate Mates, Promotions to won subgames, Captures to won subgames -> WTM()

WTM()
Loop through all white reverse moves {
SetBit()
BTM()
}

BTM()
Loop through all black reverse moves {
If Checkdespair() WTM()
}

Checkdespair() {
Loop through all black ordinary moves {
if leeds to a position with bit not set (position not won for white) return 0;
}
return 1;
}
This is just the basic outline .. ofcause some special checks were applied for King in check .. etc.

h.g.muller, would you please tell me your opinion on this system.

ZeroOne wrote:What engines, if any, can utilize your bitbases? I hope you design the program so that it can generate bitbases of arbitrary size, not just limited to 6-men... Would that be possible? I think we'd have many 7-man and perhaps even 8-man tablebases if only the Nalimov format wasn't limited to 6-men. On the other hand I don't remember reading about 6-man bitbases before. I've only got 5-man Shredderbases and Scorpio bitbases (used by Toga).
I dont write my program for any specific engine, but in general it is very easy to implement.
I didn't include any restrictions on size. But I will have to fix some problems with the limits of some variable-sizes etc.
h.g.muller
Posts: 223
Joined: Mon Feb 19, 2007 8:24 am
Sign-up code: 0
Location: Amsterdam
Contact:

Re: Speed-comparison

Post by h.g.muller »

The algorithm looks OK, and is similar to what I started using for the DTM tablebases. The advantage of building DTM (or DTC) tablebases over bitbases, is that you can store information in the DTM field of the undecided positions. This can be used to accelerate the process. The best results I obtained by storing the (forward) black move count in this field. The 'checkdespair' then simply becomes a decrement operation, so the process becomes a two-ply retrograde tree-walk, rather than a 3-ply (2 retro + 1 forward) tree. This saves a lot of time probing the tablebase. Especially for generally won end-games, the checkdespair() tends to generate and test the same moves over and over.

The price is that the EGTB is larger: I used 1 bit for the WTM positions (so this really was a bitbase), and 7 bits for the DTM field of BTM positions. The latter could be reduced to two in a bitbase, or even 1 if you have some separate table that indicates which were the positions assigned in the most recent cycle. So I lose a factor 3-4 in size. But as long as this does not overflow the available memory, that hardly hurts. In most cycles the newly assigned positions are pretty sparse, so most accesses for probing the EGTB are the sole position needed in that cache line during that cycle. So the number of DRAM accesses you need is determined by how many positions get assigned per cycle, and hardly depends on how densely they are packed.

Enormous speed gain is possible by organizing your loops such that all probing of the EGTB after moving goes to data that is guaranteed to be cached (in the process of the sequantial scan through the EGTB). It even pays to postpone the moves that would go outside the cached data set for a separate later pass. This is the technique I will use in my next builder. E.g., for building KQKR, where you need 4 nested loops for the scan, you can use the order (from outer to inner) wK, wQ, bR, bK, (assuming that neighboring entries of the EGTB differ mostly in bK position), but the number of positions with the wK on one particular square is 256K. So the L2 cache will only contain the last few wK positions, and many (actually most) moves with the wK would go outside this set. So it is better to not do any wK moves at all, and only walk the part of the tree consisting of wQ moves followed by one or two black moves. In the next scan you then change the scan order to wQ, wK, bR, bK, and only try the trees starting with wK moves.

It seems like you would need two passes through the EGTB per cycle, but by 'leap-frogging' you can get around that: just do 2 cycles within one scan, going from DTM=n to n+1 and then n+2 before going to the next wK position in the first scan, (with wQ moves only) and in the second scan go from DTM=n+1 to n+2 and n+3 before going to the next wQ position (with wK moves only).
Post Reply