Progress in fast in-RAM EGTB building

h.g.muller · Post by **h.g.muller** » Tue Mar 20, 2007 2:58 pm

I created this thread to post new developments in my project to write an EGTB builder that is fast enough to do its work 'behind the board' (e.g. in ponder time). Apart from the obvious advantage that this would not make an excessive demand on disk storage capacity, the real benifit occurs in end-games with Pawns: there only the P-slices that can be reached from the current game position have to be build. This can easily lead to savings of a factor 1000 or more, depending on the number of Pawns and their position on the board.

To be of practical use, the building time for a P-slice of an end-game with 4 non-Pawns (e.g. KRKR) must be brought to around 1 sec. With non-blitz time control upto 500 sec of building time seems acceptable for an end-game we are really in (so that subsequent play can be instantaneous). The ability to handle 500 P-slices would bring a large number of Pawn configurations within range. Even if all Pawns were passers, three additional Pawns would only require 6^3 = 216 P-slices, so that handling all KRPPKRP (7 men!) positions would be no problem at all. But in real games Pawns are usually blocked and advanced, in which case many more Pawns could be handled.

P-slices have no symmetry (they do come in pairs that are each others mirror image, though). The work required for building a P-slice is thus about 8 times larger than that for the corresponding end-game without Pawns, as they require the full complement of 16M positions. A good model problem for building a P-slice of KRPPKR (say) thus seems to be the building of KQKR (KRKR would be too drawish) without using symmetry.

The first stage of the project is thus to minimize the building time of symmetry-less KQKR.

h.g.muller · Post by **h.g.muller** » Tue Mar 20, 2007 4:09 pm

One of the areas where I achieved considerable speedup compared to my previous algorithm is the counting of the number of moves in each position. The availability of such a move count for the losing side can speed up the building process greatly: to determine if the precursor of a position that was just declared a win is lost (for the other side) we only would have to decrement that count until it reaches zero (meaning that no successors are left that are not won). This is very fast compared to doing actual move generation and following up the moves to see where they lead. It requires the investment of determining the move counts during initialization, though.

Rather than generating moves for black for every position and counting them (taking account of the way the pieces block each other's moves), I developed a routine that counts such moves 'in bulk'. I first count all the moves of all constellations of the black pieces (i.e. K+R) on an otherwise empty board (for a P-slice the board would also contain the Pawns). How exactly this is done is of little importance, as there are only 4K black positions (0.025% of the TB size).

I then copy this 4KB chunk repeatedly to the TB, once for every white constellation. This can be done fast by a bulk copying method (4 or 8 bytes at a time). I then have to correct these counts for the black moves blocked by the white pieces. In every chunk I first correct for the presence of the white Q. Other pieces could be added in a similar way. The white King is accounted for last, as it has to be handled differently: a King can never block moves from the opponent, as this would put it in check. So in stead of reducing the number of black moves, it leads to declaration of the position as illegal (with black to move).

Correcting for the presence of the wQ goes as follows: first account for the number of blocked bR moves. (Note that the count does not include captures, as the latter are conversions. For building the TB only the counting of non-conversions is of importance.) For a given position of K+Q (i.e. the chunk of 4K black constellations going with this K+Q constellation), only those black constellations with the bR on an orthogonal through the wQ are blocked by the latter. This is only 14 out of 64. The others don't have to be touched, the wQ blocks no Rook moves there.

To adjust the counts for positions with blocking, we scan through the rays on which the bR should be, one ray (of 4) at the time. The number of bR moves blocked for all positions with the bR on this ray is determined by counting squares from the wQ in the opposite direction until board edge (or anything else we bump into when building with more white Pieces or Pawns). This can be done before the scan of the bR along the ray, as it is only dependent on the position of wQ. As a first approximation, we subtract this number of blocked bR moves from all positions in the chunk that have the bR on the ray under treatment, irrespective of bK position. As the positions differing only in bK location form a contiguous block in the TB (i.e. the indexing scheme is such that the bK contributes the 6 low-order bits of the index), this can be done multiple bytes at the time, by accessing the TB (an array of char) as a longer type (int or long long int).

Now within a block of positions differing only in location of bK, only the very few with the bK on the ray through wQ and bR on the side of the bR where the wQ is have wrong counts. For bK between wQ and bR the wQ did actually not block any bR moves at all. The bK did block them, and this was already accounted for in the template of black counts. So for those we add back the unjustly applied correction. For the bK position at the other side of the wQ as the bR, the wQ does block some moves, but not as many as we already subtracted. Only the squares between bK and wQ (including the latter) were blocked by it. So for those bK positions we add back the unjustly applied correction, after reducing it by this number of squares (a different number for each bK position). After having done this for all 4 rays, all Rook moves are properly counted.

Blocking of bK moves by the wQ is much easier, as a King is not a Slider. All positions with bK next to wQ see one move blocked, no matter where all other pieces are. Only 8 out of each 64 positions need adjustment. After that, the counts are correct for all positions where the wK cannot be captured (and thus is not blocking any black moves).

The (illegal) positions where the wK can be captured can be labeled very easily in a way similar to adjusting the counts due to presence of the wQ: a bR on a ray through the wK makes the entire block (of all bK locations) for this bR position illegal, except for those positions with bK between wK and bR. This is efficiently treated by bulk labeling of the block of 64 positions, and then restoring the counts (which are stored in the same field as the label) for the few (upto 6) positions in the block with the bK in between. In addition all positions with bK next to wK are illegal, irrespective of other piece positions.

Applying this process initializes the 16M-entry EGTB array in 40 ms (on P-IV, 2.8 GHz). Generating black moves and counting them in every position took 1 sec, so this represents a speedup of a factor 25. The counting (including labeling of the positions that are illegal with BTM) takes only 7 CPU clocks average per position. (This is on a machine with RAMBUS memory; with DDR DRAM it would probably be somewhat slower.)

Next step is to include marking of the positions were black is in check in the process through similar methods.

h.g.muller · Post by **h.g.muller** » Wed Mar 21, 2007 11:53 am

I tried marking the positions where black is in check as won (for white with white to move) through stepping along Queen lines from the Queen square in the black King dimension, for every constellation of white pieces and the black Rook. (Doing this at a time when all the involved entries are in the L1 cache.) This means that for every chunk of positions with the same white constellation of K+Q, you have to do this 62 times: once for every bR location.

Marking the in-check entries this way took 38 ms for the entire TB. This is a bit much, especially if we realize that for each in-check position found, all positions with a bK next to the checked square (~6.5 on the average) have to have their move counts decremented, as the previously counted pseudo-legal move to this square now turns out to be illegal.

I am therefore now going for a different approach, that does the in-check determination and the correction to get from pseudo-legal-move counts to legal-move counts in one sweep.

For each chunk of positions with the same wK+wQ constellation (4K positions) I prepare two 64-entry scratch boards, by generating white moves in absence of the black pieces. One scratch board counts the number of times a square is reached by a white move (the number of times a bK would be in check there). The other is a board has entries in TB format, where for each covered square the 'won with WTM' bit-field is set, and the DTM/count field of all square neighboring to a covered one is used to count how many covered neighbors it has.

Before the move counting described in the previous post is done, this second scratch board is copied to the TB chunk 64 times, once for each Rook position. (The pseudo-legal countings are then later subtracted from it, as I count moves in the DTM field as negative numbers, to distinguish them from DTM values, which get positive numbers.) This copying can be done in bulk, using a wide data type to copy multiple bytes per access.

The assumption underlying this is that the bR does not affect which squares are covered by white. This is obviously wrong when the bR is blocking wQ moves. So after the bulk step, we have to correct the posititions where this is the case.

In order to do that, a scan is made along each ray from the wQ. For every square on the ray, the assumption that the wQ checked this square was invalid for every bR position on the Ray in between wQ and this square. (Except if the square was also checked by another white piece, i.e. wK, hence the scratch board with the number of moves to each square. If the check count is larger than one we can skip immediately to the next square on the ray, as blocking one of the checkers has no consequence ofor the in-check status and the neighbor move counts.) For each bR position of the bR on such an intermediate square, we have to undo the marking of the position with the bK on the current square, and add the count for the bK move to this square back to the positions with the bK on a neighbor square.

The amount of work required to undo an earlier assigned check that actually was blocked is the same as what we would have had to do marking a position as in-check, just with opposite sign. The savings come from the fact that the wQ covers on average ~23 squares, that we would have to treat for 62 bR positions (23*62 = 1426). The number of those where the bR blocked something is only small: only 23 of the 62 bR positions have that bR on a Queen ray, and even then each of those only blocks the wQ access to ~3 squares. So the correction only has to be done on an average 62 checked positions, in stead of 1426 (per wK+wQ constellation). Preparing the template only involved an average 23 checked positions, and the time for copying it 62 times is next to nothing. (Copying is between arrays that are cached, or should be cached anyway.) Thus this method should be ~16 times faster than the other.

guyhaw · Post by **guyhaw** » Wed Mar 21, 2007 7:35 pm

Joe Hurd did some excellent work, http://www.cl.cam.ac.uk/~jeh1004/resear ... chess.html , showing that (a) HOL could be used to map the FIDE articles of the game of chess relatively naturally into the formal world of the computer, and that (b) BDDs could be used to represent 'endgame data' compactly.
A slight reworking of this paper, in a reconsideration of the subject of Data Integrity, is at http://www.cl.cam.ac.uk/~jeh1004/resear ... /ceda.html
Jesper Torp Kristianson, http://www.daimi.au.dk/~doktoren/master_thesis/handin/ , also considered the use of BDDs for in-RAM endgame data.
g

h.g.muller · Post by **h.g.muller** » Wed Mar 28, 2007 4:22 pm

OK, thanks for the references.

I had to pause a few days because of the drudgery of normal life (tax declarations and such...

).

I am currently doubting the wisdom of not counting the black conversion moves. Basically the problems I am facing are due to the fact that the WTM positions in my building process are only represented by a single bit, that indicates if they are won (to white) or not. The problem is that I cannot see from that if they are won because black left himself in check, or merely because black will be checkmated in some non-zero number of moves. In both cases the moves from the BTM conversion pre-decessors to this position should not be counted as escapes (anymore). But if that takes away the last black escape in positions where black is not in check, it makes the difference between declaring that position lost for black, (conversion to a lost-but-legal position) or stalemate (only conversion to illegal, in-check positions). As the lost-through-legal vs lost-trough-illegal conversion issue cannot be resolved from the EGTB to which we convert, it would have to be checked directly from the rules.

The stalemate problem makes the order of things very critical. I am now leaning towards the following algorithm:
1) Identify (and mark as won with WTM) the black-in-check positions in the EGTB-to-build, (P-slice-to-build, actually), including broken positions where one of the black pieces coincides with one of the white pieces (assigned to the position in a smaller EGTB where the black piece would be on this square, and the white piece would not be present).
2) For every position, count the number of black non-captures with non-Pawns. Store this 'escape count' in the DTx field of the entry.
3) Add to this the number of legal black captures with non-Pawns.
4) Decrease the escape counts of the predecessors of all positions in the current P-slice marked as won (with WTM). At this stage only the black-in-check positions were so marked, so in effect we remove the illegal non-captures from the escape count.
5) If an escape count hits zero now, the position (with BTM) is either checkmate, stalemate, or has 'conversion' moves to a successor P-slice through a black Pawn move. I will call such a position a 'quasi-mate'. If black is in check in this position, it is a quasi-checkmate. In this case we leave the DTx field at zero for now (meaning checkmate).
6) However, if black is not in check, we somehow must make sure the position will end up as non-lost with BTM if it is a stalemate. But the escape count still does not count all legal moves: it ignores the Pawn moves. We will have to judge the existence and legality of such Pawn moves from first principles. This might be expensive, but the idea is that quasi-stalemates are very rare.
7) If there are no legal black Pawn moves, the quasi-mate is a true stalemate. To indicate this, and distinguish it from quasi-checkmate, we add one to its escape count. (Representing the black 'move' of claiming a draw.)
8 ) If, on the other hand, there are legal black Pawn move(s), the position is not a stalemate, and we leave the escape count at zero as for the quasi-checkmates. (i.e. consider it lost for now, until we feel like figuring out where these legal Pawn moves lead us.)
9) Now we go through the successor P-slices to which the black Pawn moves lead, which were built before. In general there is a one-to-one correspondence between positions in these P-slices (when all non-Pawns are in the same place). For every position that was not marked in these successor P-slices as won with WTM, we add one to the escape count of the corresponding position in the P-slice-to-build. Like in the stalemate case, this permanently will keep the escape count at non-zero, meaning that such a position can never become lost for black (with BTM). Note that positions that were labeled as quasi-mate (DTx=0) the DTx field was still valid as escape count, and adding the permanent escape to the successor P-slice erases them as quasi-mates and puts them back in the 'unresolved' category (which is characterized by non-zero escape count).
10) We then go through all successor EGTBs (or their relevant P-slices) that can be reached by black captures (so with one fewer white piece). For each position that as marked as won with WTM, we check in the P-slice-to-build if this position was legal with WTM (as our black-in-check marking created backup copies of these successor EGTBs in the broken positions).
11) Only if the black captures into this successor EGTB were legal, they were counted as escapes. Since they are leading to won positions, we have to take that escape away, and decrement the count of all (capture) predecessors. This approach is preferred over the opposite one (initially not counting captures and then incrementing the count for predecessors of non-won (WTM) predecessors, as, after capture of a white piece, there are expected to be fewer won than non-won (WTM) positions. In addition, it makes the legal escapes to lost positions visible to checkmate/stalemate determination.) Note that the same applies to adding escapes to quasi-checkmates as in (9).
12) Positions for which the escape count reaches zero in (11) now really have no moves left that can save black. They are positions where the only legal moves for black are conversions to positions won for white. We simply leave the DTx field of these positions at zero, grouping them with the quasi-mates that survived steps (9) and (11). (And thus were true checkmates.)
13) Upto now, the only (WTM) positions marked won were the black-in-check positions (so we could use the marks for legality testing). Now we run through all EGTB P-slices (built earlier) to which white can convert (with one fewer black piece), to mark as won (with WTM) in the P-slice-to-build all capture pre-decessors of every position in them labeled as lost (with BTM). (Each position having in general many predecessors.)
14) We do the same for P-slices to which white can convert through a Pawn move. (One-to-one correspondence between entries.)
15) At this point we are ready for the regular cycle of deriving lost-in-(n+1) from lost-in-n positions. We have already labeled all DTx=0 positions (checkmates and forced black conversions). In the first pass of the cycle we run through those, and mark their predecessors as won to white (with WTM), next to the positions marked in steps (13) and (14).
16) We then run through all positions marked in in (13)-(15), decrementing the escape count of their successors, and replacing the DTx field with the appropriate DTx=n+1 code if the count it contained hits zero.
17) step (15) and (16) are then repeated until no new positions are marked as won (in (15)) or labeled as lost (in (16)), where proper care is taken that step (16) only uses the marks newly added in the previous cycle.

This is about the only order where the proper information was (still) available at the time it was needed. So the conclusion is that (because of (3)) it would be best to count the legal capture conversions together with the intra-P-slice black escapes through bulk counting, but leave Pawn-push conversions (to other P-slices) to be added later on a case by case basis (as specified by the win/loss info in the successor P-slice). I will change my bulk-counting algorithm accordingly.

guyhaw · Post by **guyhaw** » Sun Apr 01, 2007 10:02 pm

This is sounding very complex to me, which rather suggests that it's not working out as an algorithm.
Perhaps best checking whether this works for KPK before you go too far.
g

h.g.muller · Post by **h.g.muller** » Sun Apr 01, 2007 11:11 pm

Well, perhaps I am overdoing it. Seems KQKR does not have any stalemates at all. With non-bare black King I guess the number of stalemates will be very very low in general. So it is silly to worry about them.

The nice thing about counting black's remaining non-losing moves (as negative numbers) in the DTx field is that I don't have to treat checkmates as special. If I count only non-converting moves, and add the conversions later, by retrograde moving from WTM positions that are not lost to black, an entry that was at zero automatically gets some moves back, and thus becomes non-lost again. If they don't, they stay at zero black non-losing moves. If there were conversions, they are either illegal or to positions lost for black, and you don't care which. Checkmate or forced converson by black to a lost position are both lost.

Only the stalemates require special treatment: If a position where black is not in check has zero non-converting legal moves, but they get a move added due to conversion to a non-lost position, everything is OK. But if they don't, there might still have been conversions to legal positions lost to black. If there are, the position is lost to black, but if there aren't, it is a stalemate, which is not lost to black. In the latter case it must not keep DTx=0, as the checkmates have. Adding one fictitious move to such positions takes care of this.

So after the bulk counting, and deducting all nonconverting moves to black-in-check positions (as you would normally deduct all the moves to WTM positions won for white), those that are left with zero black moves, while black is not in check, have to be scrutinized. But if there virtally are none, you don't really care how efficiently you do that. Just test them for being a stalemate by any method you like, and if they are, set the move count to one.

That simplifies matters a lot. You won't have to count any captures, be it of Pawns or other pieces. The only minor disadvantage is that while you are treating the retrograde moving from the successor EGTBs to which black can convert by capturing a white piece, you would have to decrement the count of the predecessors of al the positions that are NOT won to white. And, with a white piece just captured, perhaps there are a lot of those. But such sub-set EGTBs are small anyway, so this might also not be a real issue.

guyhaw · Post by **guyhaw** » Fri Apr 13, 2007 1:45 pm

This board seemed not to want to accept my 6.5MB pdf attachment, a scan of ken Thompson's paper on algorithms for generating 6-man EGTs.
g

h.g.muller · Post by **h.g.muller** » Fri Apr 13, 2007 4:15 pm

It seems this big a file immediately brought me over the mail-server limit (which is only 15 MB), as my inbox was already pretty full with junk. I cleaned up, and now there should be 5 MB of empty space again.

h.g.muller · Post by **h.g.muller** » Mon Apr 16, 2007 12:59 pm

Perhaps it would be good to explain my approach to speeding up EGTB building. In calculating the index for accessing the TB, I avoid doing any conversions between board positions and index, and vice versa. In stead, I generate board positions from board positions, and indices from indices, in parallel. This reduces the workload for the CPU to almost nothing, so that the building process is dominated by memory access time. Hence my optimalization efforts focus on reducing this access time.

In the building process there are two reasons for generating new indices. One is in the process of sequentially 'scanning' through the TB, where you want to generate the 'next' position to see if you have to do something with it. The next position can differ from the current one by the positioning of one or many pieces, which don't have to be displaced according to rules for chess moves. E.g. after having considered the King on h1, you might want to consider it next on a2, and after having considered it on h8, you might want to consider it on a1 while displacing at the same time a Rook from d4 to e4.

Nevertheless you need the board to decide what the next position will be, as you don't want to displace pieces to positions that were already occupied by other pieces. (In other words, you want to skip the 'broken positions' in the TB.) To prevent having to 'decode' the current index into a board position, determining the 'next' position on that board, and encoding it back into an index, I thus update index and board differentially, in lockstep. This can be done quite trivially through nested loops of the following kind:

Code: Select all

for(pos[i]=0, ix[i]=ix[i-1]; pos[i]<128; pos[i]=pos[i]+9&~8, ix[i]+=stride[i])
    if(board[pos[i]] == EMPTY)
    {   board[pos[i]] = i;     // put the piece on its new location
        ...                    // do whatever you want to do with TB[ix[i]]
        board[pos[i]] = EMPTY; // remove the piece again
    }

Note that the board uses a 0x88 scheme for square numbering, hence the peculiar increment clause x=x+9&~8 for the board position pos of piece number i. The EGTB array TB[] is simply a linearized version of an N-dimensional 8x8 board (the latter using square numbers 0-63). It is thus indexed by

SUM SqrNumber*stride,

hence placing piece i on the next square adds stride to the index.

Nesting as many such loops as there are pieces makes a systematic scan through the entire TB, skipping the broken positions. Only the innermost loop has an impact on performance. The amount of work invested in differentially updating the board position is limited to putting a single piece on the board, and removing it afterwards, and incrementing the square number. Advancing the index in lockstep only requires a single addition (of stride). If the thing to be done to the positions generated in the inner loop is conditional (e.g. only if the DTM of that position has some specific value), putting down and removing the piece on the board can be pulled inside the conditional part, so that the amount of board-updating work for untreated positions remains limited to updating the 0x88 square number (and, once every 64 steps, advancing one of the outer pieces).

The second way in which new positions are generated, is through (retrograde) moving. For this, the board position is very important, as the occupancy of squares determines if the moves are blocked. In addition, the board informs us if the moves go over the edge. So generation of moves is done by steppping over the board, adding a 'step vector' BoardVec[j] (corresponding to the ray j along which we step) to the current square number, to get the next move along the ray.

To make the index for the TB entry corresponding to positions thus reached, we have to add a similar step vector to the current index. This index step vector differs from the board step vector for two reasons: the square numbering used to derive the index is based on an 8x8 board in stead of 8x16, so moving one step forward is +8, rather than +16, and we have a second table IndexVec[j] to list the vectors in this representation. Secondly, such steps done by piece i have to be multiplied by stride (and they can be stored in the table pre-multiplied with this) before it is added to the index to make the index of the new position.

So the basic loop over moves for a slider p looks somewhat like this:

Code: Select all

CurPos=pos[p];
Board[CurPos] = EMPTY; // remove piece from old square
for(d=FirstDirection[p]; d<=LastDirection[p]; d++)
{   v = BoardVec[d]; iv = IndexVec[d]*stride[p];
    NewPos = CurPos;  // start ray at old piece location
    NewIndex = CurrentIndex;
    while(Board[NewPos+=v] == EMPTY) // guard band blocks access beyond board edge
    {   NewIndex += iv;              // Generate new index in lockstep
        Board[NewPos] = p;           // put down piece in new location
        pos[p] = NewPos;             // and update its position
        ...                          // Do what is needed with TB[NewIndex]
        Board[NewPos] = EMPTY;       // take back piece
    }
}
Board[CurPos] = p;   // put piece back
pos[p] = CurPos;

Also here the operation to be performed on the TB entries to which the moves lead is in general conditional. If it does not have to be treated, there is also no reason to finish making the move on the board by actually putting down the piece (and update its position). In fact it might not be needed to put down the piece at all (nor remove it from its original location), if we only want to operate on the TB entry (e.g. assigning a DTM), and don't want to use the new position as a starting point for further move generation just now. (The algorithm I employ advances by steps of two ply, though, so for the first of those I would need to actually generate the board position after it, by moving the piece.)

The board and step-vector arrays are just a few dozen bytes in size, and will reside in the L1 cache always, as they are constantly being accessed. The whole process leaves excessively little work to be done by the CPU, mostly updating a few loop counters, and adding the applicable strides to the board position. Most of the time the CPU is idle, waiting for the requested TB entries to be fetched from DRAM.

h.g.muller · Post by **h.g.muller** » Thu May 03, 2007 3:53 pm

So far I did not realize this: for the purpose of including the exact consequences of the 50-move rule, it would be necessary to maintain the DTZ in ply, rather than in moves. This would require a different treatment of BTM positions where black is checkmated, and where black is forced to make a losing conversion. Both types of positions have no legal moves for black that stay within the current P-slice, and no conversions to drawn (or won-for-black) positions in successor TBs. The difference is that the forced conversions do have legal moves to postion lost-for-black, while the checkmates of course have no legal moves at all.

Normally I would not pay attention to this difference, as lost is lost, and it is not very interesting if it is lost because of conversion to a lost simpler end-game, or because of checkmate. From the viewpoint of the 50-move rule, however, the checkmate allows you one extra ply! If black is checkmated after 100 reversible ply, it counts as a win. But if black after 100 ply has no legal moves other than capture a piece or push a pawn, losing quickly afterwards, it is a draw! In order for white to win, black must be forced to make such a capture after 99 ply!

For the moment I will ignore this problem, as I expect that in end-games with Pawns the strong side will usually be able to avoid approaching the 50-move limit quite easily, by pushing a Pawn now and then. This will allow me to keep representing the DTZ in full moves rather than ply(allowing smaller DTx field in the TB, and better compressibility), and to have no need for distinguishing forced black conversions to lost positions from checkmates. The latter means there is no need to distinguish illegal conversions from lost conversions, which simplifies matters a lot, as this distinction is not made in the successor EGTBs, and would have to be derived from first principles. Without this distinction I can continue to approach the problem from the side of the non-lost positions (which are automatically legal).

The algorithm thus remains:

1) (bulk) mark all black-in-check positions
2) (bulk) count all pseudo-legal black moves that stay within the P-slice (i.e. no captures or Pawn pushes)
3) decreas the counts of all predecessors of the marked (=black-in-check) positions
4) if the count hits 0, and black is not in check, do a stalemate test by testing all conversion moves for legality (based on first principles, not on TB probing). For a confirmed stalemate add one 'ghost' move.
5) add the black conversions to non-lost BTM positions in the successor EGTBs to the black move counts

At this point, all BTM positions with zero moves are either checkmates, or forced black conversions to lost positions. We can thus interpret the zero move-count in their DTx field as the depth of these positions. All other DTx fields will contain the move counts, stored as negative numbers.

6) unmake all white conversion moves from BTM positions that are not lost-to-black in the successor EGTBs/P-slices, and mark the WTM predecessors reached by them as 'newly won' (if they were not marked as 'won' already)
7) mark all predecessors of positions within the current P-slice with the last-assigned depth (=0) as 'newly won' (if they were not marked as 'won' already)
8 )decrease the counts of the predecessors of all newly won positions (reducing the status of the latter to just 'won').
9) if the count hits 0, assign the new depth (=1) to these BTM positions

We continue to cycle through steps 7-9 with ever increasing depth until no more positions are marked in (7), or no more depths are assigned in (9). That completes the building of the P-slice for the black-to-move-and-loses positions. The DTx fields >= 0 can then be compressed and written to disk; those that still contain nonzero move count (i.e. DTx < 0) should recieve a 'not-lost' DTx code. The only information we have to retain on them for the purpose of building further predecessor P-slices is if they were lost or not, i.e. we could compress the infromation on the BTM positions to a single bit per position.

The information on WTM positions was already a single bit per position. If we want a DTx EGTB for these positions as well, we should not just have marked them as 'newly won', but recorded the depth at which we did this marking as well. We could then also commit this information to the EGTB on disk, and only keep the bitmap for inheriting by predecessor P-slices.

Note that draws will have some non-losing black moves left when the building process finishes, and hence the ghost move added to stalemates will make sure that it is recognized as a draw (or actually as non-lost). We use a similar trick to handle BTM positions where the white King is in check, (i.e. illegal ones), by giving such positions extra black moves (e.g. corresponding to the capture of the white King), which will never be taken away, (as it was not an intra-P-slice move, and we do not decrement counts for conversions to a King-less successor EGTB), so that the count for such a position can never reach zero.

guyhaw · Post by **guyhaw** » Thu May 03, 2007 9:21 pm

Yes, quite a few stm-forced-to-convert to loss: one compares the number of DTM=0 positions with the number of DTC=0 or DTZ=0 positions.
KNNKP clearly has many, and so does KQNKRR. Something has to be lost by counting in winner's moves rather than plies, and this is it.
g

Grant · Post by **Grant** » Fri Jan 23, 2009 10:29 am

HG

It looks like your post about counting blacks moves in bulk are in fact pseudo-legal moves.
Would you or anyone else have any ideas on how count blacks legal moves in bulk?

Grant

h.g.muller · Post by **h.g.muller** » Fri Jan 23, 2009 6:03 pm

This was long ago, as in the mean time I have been distracted by many other Chess-programming projects. (Butthis one is still on the to-do list!). So let me see if I can still answer that:

The described process indeed counts pseudo-legal moves, and then subtract the illegal moves later, for all positions that were pre-decessors of black-in-check positions. The idea was that this decrementing of precessor move counts was part of the normal tablebase building process anyway, where you do it on predecessors of newly won positions. The black-in-check positions are simply the newly won positions of the zeroth cycle. The only thing special in this cycle is that once a count hits zero, you have to judge if the position is mate or stalemate, or if there are conversion escapes. In later cycles you would automatically declare a position as lost when its count hits zero. To do something that depends on the count hitting zero or not is only possible when you decrement the counds individually.

But it is true that there are very many black-in-check positions, and that their pre-decessors cover an enormous fraction of the tablebase. Since I wrote all this, I have started to suspect that it would actually be better to recognize the checkmates not from the move count hitting zero, but from the fact that they must be checks. So I wanted to bulk-initialize the move counts with pseudo-legal moves (which are never zero), and then recognize the zero-counts at the point where I individually decremented them, so I can take the appropriate action.

The point (which I did not fully realize initially) is that there is no need at all to recognize stalemates. In the beginning, the tablebase is initialized with all positions as undecided, i.e. not wins. That means the stalemates are already correctly indicated as non-wins, and there is no reason to touch them (and thus no reason to identify them). Only the mates have to be found, and they can conveniently be identified as black-in-check positions.

So I might as well treat every square covered by a white piece as if it was blocking a black King move, and bulk-decrement all counts of positions with a King next to it. This is easier if the black king does not provide the least 6 significant bits of the index, but the next grooup, as this would make a lot of positions to be decremented contiguous. (Allowing bulk-decrementing).

guyhaw · Post by **guyhaw** » Sat Jan 24, 2009 6:07 am

This is an old thread resurfacing. Since it was started, has it not been agreed on this board that building WDL EGTs first is a help in building DTC/M/Z EGTs, since it is not necessary to investigate whether a potential stm-loses position is in fact a loss if it is not.

Also, I'm not sure about this move-counting thing. Ken Thompson tried it and regretted it as 'not contributing': see papers by him in ICCA_J.

g

h.g.muller · Post by **h.g.muller** » Sat Jan 24, 2009 9:44 pm

Well, nothing of the sort has been agreed by me.. Building WDL would take me exactly as long as building DTZ, so building them first would be a total waste of time.

The reason this did not work for Thompson was that he was building EGTs on disk, and that the process was totally dominated by disk I/O time. To shuttle extra info, like the counts, to and from disk was thus a bad idea, and not competative with actually doing the verification (which is an in-RAM process) as the need occurred. A count computed at the beginning of cycle 1 for a position that would be decided at cycle 20 would have to be written 20 times to and from disk before it was used!

But this is totally different if the building occurs in RAM. If the EGT is sitting in RAM in its entirety, nothing has to be shuttled anywhere. The counts are just sittingthere, which takes no CPU time. So there is no overhead whatsoever in generating them at the beginning, rather than at the time they are needed, even if that is 100 cycles later. That makes all the difference.

Grant · Post by **Grant** » Sun Jan 25, 2009 11:52 am

My friend and I have created a 5-man pawnless EGTB generator.

Marking black-in-check and white-in-check positions is done in bulk first and takes only a fraction of a second. Move counting blacks "legal" moves is done next, but this takes such a long time and I just don't see how this can be done in bulk. Mate positions are then identified as white-not-in-check, black-in-check and has-no-moves, and building begins.

If instead, I counted black's pseudo-legal moves in bulk, how would I then go about subtracting the illegal moves in bulk?

Grant

h.g.muller · Post by **h.g.muller** » Sun Jan 25, 2009 8:54 pm

When you are not in check, illegal moves come in two kinds: the King stumbling into an attacked square, and moving a pinned piece.

So when all btm positions contain the pseudo-legal move count, woud scan the squares coverd by the white pieces (by generating white moves). In a first iteration, I would assume that no black piece was pinned. That means I should decrease the count of all positions with the King on a neighboring square, and the other black pieces anywhere. But for a given white constellation, it might be better to count first for every board square how many neighbors are covered by white. If that would be, say, 3 for a given square, you would have to loop over the positions over black non-king pieces, keeping the black King fixed on this square, and decrease all counts there by the pre-determined number 3.

Now this would be in error when the black King was already blocked by a black piece on a square attacked by white. This move would then have unjustly be deducted. So you would have to add back 1 for every square covered by a white piece for positions where a black piece is on that square,, and the King next to it. This would require looping over possible other black pieces, and 8 King positions next to the square. In KQKR it would be only over 8 King positions for each attacked square, so that is much less work than decrementing the 64 Rook positions for each square next to an attacked square.

Then you have to account for blocking of the white pieces through black non-king pieces. To block, a black piece should be on an attacked square. So again you would have to scan the squares, but this time only those attacked by white sliders. For every square, you loop over black blockers (only R in KQKR). That must be on the square. Then you have to continue the ray scan to step over all attacks that are blocked. (Note there might be other attackers, you have to keep a board with attack counts for the white constellation you are working on, and only treat squares on the ray that had attack count 1.) For every square in the 'shadow' of the black piece all positions with the black King on a neighboring squares should get the count incremented (looping over 8 king positions). This corrects for the fact that the King was allowed to shelter behind its own piece. But for positions where the black King was on the blocked ray, the blocking piece would lose all its moves, as it is now pinned. If it does not move alongthe pin ray, this is a fixed number (for the given white constellation and Rook position). But if thepiece moves along the ray, the number will depend on how farthe black King is downstream the ray. So these cases have to be treated seperately.

Positions where the black King is in check should be treated differently. These positions basically behave as if all other black pieces are pinned, wherever they are. But you have to account for interpositions.

h.g.muller · Post by **h.g.muller** » Sat Jan 31, 2009 4:05 pm

I am also reviewing the in-RAM building process now that I started working again on 7-men generation, as even when disk storage is used, as part of the process will still take place in RAM.

One new idea to reduce the number of L1 cache misses is to apply white retro-moves not to individual positions, but to an entire bundle of black moves. So in stead of code of the form

Code: Select all

/******************** poor code **********************/
for(every square for white dynamic piece)
    for every constellation of black pieces)
        if(lost_in_N[currentPosition])
            for(every retroMove of white dynamic piece)
                Make(white retroMove);
                if(!won[currentPosition])
                    won[currentPosition] = TRUE;
                    for(every black retroMove) // loop over potentially lost positions
                        Make(black retroMove);
                        for(every black move) // verify
                            Make(black move);
                            if(!won[currentPosition])
                                goto escapeFound;
                        lost_in_N_plus_1[currentPosition] = TRUE;
                        escapeFound:
                    next black retroMove
            next white retroMove
    next black constellation
next square of white piece

the looping over black constellations would be displaced to within the loop over white moves. This makes the working set for all black constallations fit in L1 for the duration of the looping over such constellations, and all black retro- and forward verification moves would fall into this set. (If there are not more than 3 black pieces.) It would also save time on white move generation, as each white move generated is now applied to many positions (differening in black constellation.

There is one catch, though: a given white slider move might not be valid for all black constellations, as in some of the constellations it might have been blocked. So in the loop over black constellations, we would have to skip those constellations that would have prevented the white retrio-move leading to it. I was already using this bundling in my old EGT generator, but only for black king positions. A black King can never block a white slider retro-move, as this would then lead to a wtm position where the slider could capture the black King. So such positions would be WON (wtm) anyway, not because white could move to the lost (btm) position from which they were reached (which it can't, because of the obstructing King), but would still win by King capture. So I did not have to worry about blocking.

Now for other black pieces this did not hold. But it is quite easy to take account of their blocking too. The only thing that is needed, it to mark squares on the playing board where a white slider passes over when the white retro-moves are made. This typically entails adding one mark when we move the slider to the next square of the reay, and removing all marks once we start with a new ray. Any lost-in-N position found for some black cconstellation can then be skipped in the loop over black constellations when one of the black pieces, on reconstructing the board for that position, turns out to land on a marked square.

Code: Select all

/******************** improved code **********************/
for(every square for white dynamic piece)
    originalPosition = currentPosition;
    for(every retroMove of white dynamic piece from that square) // so normal move to that square
        Make(white retroMove);
        MarkSquaresPassedOver(white retroMove);
        for every constellation of black pieces)
            if(BlackPieceOnMarkedSquare(currentPosition)) continue; // skip blocked white moves
            if(lost_in_N[originalPosition])
                if(!won[currentPosition])
                    won[currentPosition] = TRUE;
                    for(every black retroMove) // loop over potentially lost positions
                        Make(black retroMove);
                        for(every black move) // verify
                            Make(black move);
                            if(!won[currentPosition])
                                goto escapeFound;
                        lost_in_N_plus_1[currentPosition] = TRUE;
                        escapeFound:
                    next black retroMove
        next black constellation
    next white retroMove
next square of white piece

Disadvantage would be that for every square the white dynamic piece is on, the lost-in-N list would have to be looped through a number of times, once for each white move, and each white move would use a completely different part of the EGT for probing and making the black moves in. This would likely flush the set of some previous white retro-moves out of L1, while that set could have been still useul in principle, because some other white moves starting from another square of the white piece would land on it as well. But these moves are now considered only after many others.

This can be prevented, however! The andswer is to not treat the white moves in retrograde, but in stead as forward moves. When applying white moves to single positions, forward generation is very inefficient, as most of the time it would not reach a lost-in-N position (which are very sparsely occurring in the EGT). So you would have generated the move in vain, and it is much more efficient to first locate a lost-in-N position, and generates retro-moves from there. Then no move is generated in vain, as every move is one that hits a lost-in-N position (as it starts from one as a retro-move).

But when working on an entire bundle of black constellations, this argument reverses. The bundles to which one move is applied are so large (for 3 black pieces there are 256K black constellations) that there will always be some lost-in-N positions somewhere in the bundle. (Plus that white move generation is no longer much of a concern for overall efficiency, as it has been moved to an outer loop.) So I plan to switch to a loop structure for the grandparent process that looks like:

Code: Select all

/****************** optimal (?) code ********************/
for(every square for white dynamic piece)
    for(every move of white dynamic piece from that square)
        Make(white move);
        originalPosition = currentPosition;
        UnMake(white move);
        MarkSquaresPassedOver(white move);
        for every constellation of black pieces)
            if(BlackPieceOnMarkedSquare(currentPosition)) continue; // skip blocked white moves
            if(lost_in_N[originalPosition])
                if(!won[currentPosition])
                    won[currentPosition] = TRUE;
                    for(every black retroMove) // loop over potentially lost positions
                        Make(black retroMove);
                        for(every black move) // verify
                            Make(black move);
                            if(!won[currentPosition])
                                goto escapeFound;
                        lost_in_N_plus_1[currentPosition] = TRUE;
                        escapeFound:
                    next black retroMove
        next black constellation
    next white retroMove
next square of white piece

The main scan of the EGT for lost-in-N positions thus takes place at the level of wtm positions, and, for any constellation of white pieces there, we probe with forward moves to identify the (sparse) lost-in-N btm positions that can be reached from it with a forward move of the dynamic white piece currently under treatment. And then locate all potentially lost btm positions with retro-moves from there once a lost successor was identified.

koistinen · Post by **koistinen** » Tue Feb 03, 2009 2:18 am

I have been watching youtube videos and found Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing is about computations somewhat similar to what we want to do.

CCRL Discussion Board

Progress in fast in-RAM EGTB building

Progress in fast in-RAM EGTB building

Bulk counting of black moves

Marking black-in-check positions

In-RAM EGTs

??!!

Max limit in attachment sizes?

stm converts to loss ...

Re: Progress in fast in-RAM EGTB building

Re: Progress in fast in-RAM EGTB building

Fast EGT Building

Re: Progress in fast in-RAM EGTB building

Re: Progress in fast in-RAM EGTB building

Re: Progress in fast in-RAM EGTB building

Re: Progress in fast in-RAM EGTB building

Re: Progress in fast in-RAM EGTB building