EGTB sharing - new plan

Post by **Ray** » Sun Mar 30, 2008 5:47 pm

jkominek wrote:
I was wondering about physical location. Ray is in New Zealand?

john

No, I'm in England

Post by **Kirill Kryukov** » Sun Mar 30, 2008 6:11 pm

jkominek wrote:I was wondering about physical location.

Somewhere in Utah, USA.

jkominek wrote:The two files I selected downloaded at 105 KB/s. Not too shabby.

Could be faster, still not bad either. I observe similar download speed here (using one session).

Dhanish · Post by **Dhanish** » Mon Mar 31, 2008 8:53 am

Thankyou Kirill,

I am looking forward to faster downloads.

I started downloading the six men tablebases set by set starting last September, and so far 9 sets are complete. I am behind a firewall and consequently have a low ID problem. The average download speed is about 20kB/s

.

Dhanish

h.g.muller · Post by **h.g.muller** » Mon Mar 31, 2008 10:49 am

Dhanish wrote:The average download speed is about 20kB/s .

This is definitely something we should be able to beat by local generation...

Codeman · Post by **Codeman** » Mon Mar 31, 2008 12:48 pm

h.g.muller wrote:
jkominek wrote:A generator that is streaming IO-bound (as opposed to random access bound) would be a major advance.
Well, I think this is what my algorithm should be able to deliver:

For a 3+3 men EGTb, say, I would split each cycle in two passes, one pass treating the (retrograde) moves of first white piece plus all black moves, the second pass the moves of the other two white pieces (plus black pieces). So the EGTB on disk would be structured as a 64 x 64 matrix of chunks of 16M positions, the first two white pieces being in the same constellation within every chunk. The row index of this matrix would depend on the position of the first white piece, the column index on that of the second.

The building proces would bring (for each cycle) step through the matrix row by row (first white piece in constant locations within a row) or colums (second in constant locations) of that matrix in RAM. One row or column would occupy 1GB, and all moves treated in the pass would remain within this 1GB working set. Each row or column will have to be brought in memory only once per cycle. By alternating the order of the row-wise and column-wise scans on subsequent cycles, the scans from these subsequent cycles could be combined, and every row or column would be advanced two cycles when it is in memory. So all data has to be read /written nly once per cycle.

The access is not completely sequential: if the matrix is stored by column, the row-wise access would do random access in chuncks of 16MB (or so much less as the compression factor allows). But these chuncks are large enough that the overhead of the 64 seek operations to position the head over the next chunk are completely negligible to the transfer time. (1GB at 50MB/s takes 20 sec, a seek on the order of 10 msec, so 0.6 sec access time.)

Good idea and this should solve the problem for a while.
One always has to keep all black positions as fast accessible as possible (in RAM). So if you ignore the index-optimizations you could solve engames with up to 5 black pieces (64^5 Byte=1GB Ram)

Then, are you using any index-optimizations (mirroring,rotating, etc)? If you are saying that the hard-disk speed will be the decisive factor for the generation, more complex optimizations could probably be done as well.

jkominek · Post by **jkominek** » Mon Mar 31, 2008 2:42 pm

h.g.muller wrote:OK, thanks. It seems that a fully unsymmetric one (w.r.t. piece exchange) like KRBNKQ takes about 3GB after compression. That is only about a factor 3 compared to the 8GB an uncompressed 6-men would need. So your 20MB/sec is a bit optimistic at any rate.

This is not a big issue, but KRBNKQ is 12.37 GB uncompressed. (Max dtm is 121 so we don't have to double to 24.74.) A compression ratio of 3 seemed too small, which is why I checked; 4.5 is within normal.

h.g.muller wrote:But, more importantly, there is no reason why the same compression could not be used during generation. In fact unfinished EGTBs should compress better, because more leading bits of their DTMs are zero, and more DTMs are zero. So I really think this argument plays no role, or even works in favor of generation.

During-generation compression is the reason why FEG is CPU bound, I believe. Without knowledge of the internals it's impossible to say how much computation is being applied. Presumably the setting is "fast but bigger" during generation, and "slow but smallest" in the final stage of file writing.

Incorporating compression while still keeping your algorithm IO-bound would be an unprecedented accomplishment. Even more so on my aging 1 GHz PIII, as you suggest is possible.

Huffman coding is probably the best choice. Arithmetic coding is the theoretical best, but produces a non-integer number of bits per symbol. In our application we don't need that headache. Plus, Huffman is much faster.

john

jkominek · Post by **jkominek** » Mon Mar 31, 2008 3:05 pm

Ray Banks wrote:Yes, I think it is perfectly fine. I'm not sure what if anything I will actually download. I have here the chessbase "DVD Endgame Turbo 2" which has some 6 men tablebases on it. Presumably they are the most important ones - although this product is quite old now.

Oh good, I've been wondering if someone has ChessMaster with EGTBs. Could you provide a list of your tables? At some later point I'd like to ask you to cross-check a small set of positions.

I have nowhere near enough space for a full set, but a better partial set may be possible over time.

Not that I presume to dictate your expenditures, but it's time to drop a few quid on a couple new hard drives, Ray, seems to be!

john

h.g.muller · Post by **h.g.muller** » Mon Mar 31, 2008 3:13 pm

Codeman wrote:Good idea and this should solve the problem for a while.
One always has to keep all black positions as fast accessible as possible (in RAM). So if you ignore the index-optimizations you could solve engames with up to 5 black pieces (64^5 Byte=1GB Ram)

Then, are you using any index-optimizations (mirroring,rotating, etc)? If you are saying that the hard-disk speed will be the decisive factor for the generation, more complex optimizations could probably be done as well.

Well, this is a bit too optimistic: I need to have at least one white piece to move as well, as my algorithm advances in two-ply steps. So with 1GB RAM of working space I could only handle end-games upto 4 black pieces. But Pawns do not count! Pawnless 7-men endings of the type 2+5 thus might be a problem. 7-men 3+4 Pawnless would require two passes through the EGTB per cycles (one per white piece, but the last pass can be combined with the first of the next cycle).

I am planning to use symmetry at the level of the white pieces. E.g. in a Pawnless 2+4 end-game, say KQKBNN, the 64x64 matrix of 16MB chunks would have the dimension representing the white King incomplete, and only include those that would have the wK in the cannonical triangle. So it would really be a 10x64 matrix. In the step requiring all white King positions (with fixed Queen) to be in memory, you would still need to read 64 chunks. But you would 'transpose' them on the fly for the black pieces before storing them into RAM, and you would fetch them from the disk chunk corrsponding to that with the Queen in the mirrored position.

Note that this only is an issue for Pawnless end-games. A P-slice will have no symmetry to start with.

jkominek · Post by **jkominek** » Mon Mar 31, 2008 3:27 pm

Codeman wrote:One always has to keep all black positions as fast accessible as possible (in RAM). So if you ignore the index-optimizations you could solve engames with up to 5 black pieces (64^5 Byte=1GB Ram)

You don't have to keep all of btm in memory when working on wtm. In a (pawnless) EGTB it is common practice to step through the 462 king configurations in sequence. For a given wtm king configuration you only need to hold in memory the array slices containing all possible locations that the black king can escape to. If the black king is not on the board's edge there are 9 positions. Hence 9 slices for a 462/9=51.3 savings in memory requirements. If it is on edge the situation is even more favorable.

For this to work your indexing formula must be kings-first (which is normal, i.e. the way it is in Nalimov). The trade-off is that you have 462x the IO going on. Which is best depends on what empirical measurements reveal - this may be a possibility for you and H.G. to investigate.

john

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 8:38 am

In my algorithm a cycle advances the state by a full move, calculating the btm lost-in-(N+1) positions from the btm lost-in-N positions. The wtm intermediate positions are only needed as a bitbase, to prevent duplication of work. The building process thus generates 3-ply triees: one white retrograde move for a btm lost-in-N position to reach the wtm position, and from there (if it was not already marked as won in the bitbase) all retrograde black moves to the btm positions (which are now potentially lost-in-(N+1), if they were not already lost-in-N-or-less). From each potentially lost btm position we then have to make a forward black ply, and can reset it to undecided if a forward move is found to a wtm position that is not marked as won in the bitbase.

The last two moves of the treee thus move black pieces, and by having all black constellations for a fixed constellation of the white pieces in memory (storing them as a contiguous 'chunk'), these two moves would remain in the memory-loaded data set. By splitting the tree at the root in sub-trees that start with moves of different white pieces, and treating these subtrees in different passes, the 3-ply sub-trees all stay within a data set that contains al chunks that keep all white pieces fixed, except the one that we are moving. So once the data set of all possibilities for all black pieces plus (at least) one white piece is loaded, all 3-ply trees from all these positions can be walked without further disk access.

So it is not that all wtm or all btm positions have to be present at all times. You need all wtm and btm positions that keep all white pieces except one fixed in the same way. Then you can find all btm lost-in-(N+1) positions that are lost through moves of the non-fixed white piece in that entire set.

The minimum requirement for this algorithm is that the memory can hold all constellations of black pieces plus one white piece. You would then in principle need as many passes as there are white pieces. But if your memory can hold more then the set with just a single white piece (e.g. if you are doing a 4+3 pawnless, you can hold 2+3 in 1GB), you could combine the passes for the white pieces that are not fixed, and treat all sub-trees starting with moves of any of those pieces. And it is always possible to combine the last pass of one cycle with the first one of the next, continuing constructing btm lost-in-(N+2) positions in the same data set as you just completed construction of the btm lost-in-(N+1) positions. So in practice 4+3 with 1GB of memory can be done with one pass per cycle, alternately advancing two full moves through moving the first two white pieces, and then through moving the last two white pieces, in leap-frog fashion.

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 10:50 am

Another thought on storing the EGTB in RAM while building:

We could split the DTX field into a 4-bit frequently accessed part ('quickDTx'), plus an 'overflow' byte stored elsewhere. The 4-bit field would encode 'undecided' positions as 0, positions that have DTx specified in the overflow byte as 15, (taking care of DTx 1-256) positions that have DTx-256 and DTx-512 in the overflow byte as 14 and 13, respectively.

Codes 1-12 could then be used to indicate the 11 most recently assigned DTx, plus the one we are currently assigning, in a roulating fashion. In the first 12 cycles we would use them to encode DTx = 1-12, never touching the overflow bytes (they stay all at 0). During the 12th pass, we would do a 'collapse' of the quickDTx, the values 1-11 would all be set to 15, the original number being copied to the overflow byte, while 12 would be changed into 1. We can then do 11 more cycles, assigning quickDTx = 2-12 for true DTx = 13-23. After that, there follows another collapse to codes 13 (with overflow now set to quickDTx+11) and 1, etcetera.

The overflow bytes would only have to be accessed once every 11 cycles (and would not have to exist at all when building bitbases), and would remain safely tugged away on the hard disk during the other cycles. So you would catch two birds with one stone: most of the data would not have to be accessed most of the time, while the part that has to be accessed frequently can be compressed much better. The latter because it will be dominated by one or two codes (0 and 15) in any stage of the building process, as in the beginning most positions are undecided (code 0), while at the end most have been assigned a permanent DTx (code 15).

Of course the overflow bytes themselves will be stored on HD in compressed form as well. They don't make a significant demand on RAM in their uncompressed form, as the only thing we really do with them is merge the old set with the newly generated DTx of the last 11 cycles as specified by the quickDTx RAM array (before that array is compressed and saved itself). This can happen in a small buffer, streaming fashion.

Codeman · Post by **Codeman** » Tue Apr 01, 2008 2:49 pm

h.g.muller wrote:Another thought on storing the EGTB in RAM while building:

We could split the DTX field into a 4-bit frequently accessed part ('quickDTx'), plus an 'overflow' byte stored elsewhere. The 4-bit field would encode 'undecided' positions as 0, positions that have DTx specified in the overflow byte as 15, (taking care of DTx 1-256) positions that have DTx-256 and DTx-512 in the overflow byte as 14 and 13, respectively.

Codes 1-12 could then be used to indicate the 11 most recently assigned DTx, plus the one we are currently assigning, in a roulating fashion. In the first 12 cycles we would use them to encode DTx = 1-12, never touching the overflow bytes (they stay all at 0). During the 12th pass, we would do a 'collapse' of the quickDTx, the values 1-11 would all be set to 15, the original number being copied to the overflow byte, while 12 would be changed into 1. We can then do 11 more cycles, assigning quickDTx = 2-12 for true DTx = 13-23. After that, there follows another collapse to codes 13 (with overflow now set to quickDTx+11) and 1, etcetera.

The overflow bytes would only have to be accessed once every 11 cycles (and would not have to exist at all when building bitbases), and would remain safely tugged away on the hard disk during the other cycles. So you would catch two birds with one stone: most of the data would not have to be accessed most of the time, while the part that has to be accessed frequently can be compressed much better. The latter because it will be dominated by one or two codes (0 and 15) in any stage of the building process, as in the beginning most positions are undecided (code 0), while at the end most have been assigned a permanent DTx (code 15).

Of course the overflow bytes themselves will be stored on HD in compressed form as well. They don't make a significant demand on RAM in their uncompressed form, as the only thing we really do with them is merge the old set with the newly generated DTx of the last 11 cycles as specified by the quickDTx RAM array (before that array is compressed and saved itself). This can happen in a small buffer, streaming fashion.

Doing this, you would reduce the required ram space by 50%(using 4bit instead of 8bit), so you still wouldn't come around streaming the data from the HD like in your described algorithm. But additionally from cycle 12 on every cycle you would have to go through all the overflow bytes and update changes. So this would require read/writing to a file probably as big as the final egtb.
So in total a HD traffic increase of a factor 1.5 (cycle > 12)

Thats the case for endgames with a maximum dtc < 256. For larger ones the advantage gained would be larger.

You as well mentioned the compression. You are right on that, I think the files would compress pretty well.
We would have to check first, whether this option would in the end work out better than don't using the overflow bytes.

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 3:19 pm

Codeman wrote:Doing this, you would reduce the required ram space by 50%(using 4bit instead of 8bit), so you still wouldn't come around streaming the data from the HD like in your described algorithm.

I am not sure why you say that, or why you link it to the algorithm. I think this way of representing the data is pretty much independent of what algorithm is used. It blows up the data by a factor 1.5, because you use now 4+8 bits. (But in general 8 bits would be to small for the DTx anyway, so you would have to do something. In particular if you would like to count DTx in ply). But the trick is that you divide the field in a small frequently accessed part and a larger rarely accessed part.

But additionally from cycle 12 on every cycle you would have to go through all the overflow bytes and update changes. So this would require read/writing to a file probably as big as the final egtb.
So in total a HD traffic increase of a factor 1.5 (cycle > 12)

No, you did not fully get it. You would go through all the overflow bytes after cycle 12 (before writing away every chunk of data for that cycle), but you would not touch the overflow bytes on cycles 13-22. Cycle 13 would start with all the quickDTx 0, 1 or 15, where 1 means DTx=12, and 15 means DTx<12, and you could continue to add cycles until you run out of quickDTx codes (i.e. until cycle 23, which sets quickDTx=12 for true DTx=23). At the early stages (i.e. until your overflow byte for quickDTx=15 reaches 255) you could even use the quickDTx codes 13 and 14 as well for cycle 24 and 25. Only at the end of cycle 25 you would need to access the overflow bytes again, to add all the won positions from cycles 12-24.

So if you did not compress any data you would (until cycle 255) access one byte per position once every 13 cycles (i.e. 0.08 bytes/pos), plus half a byte on every position. That is only 0.58 byte per position, it is a reduction of 42%, rather than an addition of 50%.

And the overflow bytes would compress quite well, because in the quickDTx codes you can already see which of them are zero (namely all those with quickDTx != 15). So you can simply skip those (i.e. give them a zero-bit Huffman code).

Thats the case for endgames with a maximum dtc < 256. For larger ones the advantage gained would be larger.

You as well mentioned the compression. You are right on that, I think the files would compress pretty well.
We would have to check first, whether this option would in the end work out better than don't using the overflow bytes.

Codeman · Post by **Codeman** » Tue Apr 01, 2008 3:30 pm

h.g.muller wrote:...
The last two moves of the treee thus move black pieces, and by having all black constellations for a fixed constellation of the white pieces in memory (storing them as a contiguous 'chunk'), these two moves would remain in the memory-loaded data set. By splitting the tree at the root in sub-trees that start with moves of different white pieces, and treating these subtrees in different passes, the 3-ply sub-trees all stay within a data set that contains al chunks that keep all white pieces fixed, except the one that we are moving. So once the data set of all possibilities for all black pieces plus (at least) one white piece is loaded, all 3-ply trees from all these positions can be walked without further disk access.
...

I understand how you do the first pass, but how are you going to do the passes for the other white pieces? How are you going to do the moves with these, if you only have tables where they are fixed to certain squares?

Another idea:
Would it be possible to keep all white pieces fixed and only the black pieces dynamic. But always when loading a subendgame with certain fixed white pieces as well load the subendgames, where white pieces can come from. (in total this would in most cases take less space than your option of loading all black + 1 white piece)
The advantage would be that we wouldn't have to do several passes and when moving a certain white piece after having finished a subset most old subsets can still be kept.

Codeman · Post by **Codeman** » Tue Apr 01, 2008 3:41 pm

h.g.muller wrote:No, you did not fully get it. You would go through all the overflow bytes after cycle 12 (before writing away every chunk of data for that cycle), but you would not touch the overflow bytes on cycles 13-22. Cycle 13 would start with all the quickDTx 0, 1 or 15, where 1 means DTx=12, and 15 means DTx<12, and you could continue to add cycles until you run out of quickDTx codes (i.e. until cycle 23, which sets quickDTx=12 for true DTx=23). At the early stages (i.e. until your overflow byte for quickDTx=15 reaches 255) you could even use the quickDTx codes 13 and 14 as well for cycle 24 and 25. Only at the end of cycle 25 you would need to access the overflow bytes again, to add all the won positions from cycles 12-24.

So if you did not compress any data you would (until cycle 255) access one byte per position once every 13 cycles (i.e. 0.08 bytes/pos), plus half a byte on every position. That is only 0.58 byte per position, it is a reduction of 42%, rather than an addition of 50%.

And the overflow bytes would compress quite well, because in the quickDTx codes you can already see which of them are zero (namely all those with quickDTx != 15). So you can simply skip those (i.e. give them a zero-bit Huffman code).

Now I get you, only accessing the overflow-bytes every 12th cycle changes everything and gives a nice reduction.

Furthermore you said "overflow bytes ... (and would not have to exist at all when building bitbases)", do you see any implementation of this idea to bitbases?

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 5:15 pm

Codeman wrote:I understand how you do the first pass, but how are you going to do the passes for the other white pieces? How are you going to do the moves with these, if you only have tables where they are fixed to certain squares?

You save the other white pieces for another pass through the TB. Say you are doing KQKR. First you go through it in 256K slices that each have the white King fixed, and all other pieces dynamic. Within every such slice you use Queen moves to find predecessors of DTx=N btm positions. These are then won wtm positions, but they might already have been marked as such from a previous cycle. If so, we ignore them and go to the next white Q move. If they were not marked as won wtm, we mark them as such, and then do retrograde black K and R moves. For every btm position we get and are still undecided we then do forward black moves to see if black can reach a wtm state that is not won (for white). I will call this verification of a ptentially-lost position. If it can, the position remains undecided. If not, it gets DTx=N+1.

There is no harm in searching the potentially lost positions and verifying them before all wtm states of this cycle have been found: in the worst case a position that is lost-in-(N+1) will remain undecided because black seems to have an 'escape' to a wtm position that seems not won to white, but only because we did not get to it yet in this cycle. But then we will get to it later (because eventually we will get to all predecessors of DTx=N btm positions). And when we do, we will visit the potentially-lost -in-(N+1) btm position again, and verify it again, and now the escape has been plugged.

So there is no reason to do the marking of won wtm positions in any particular order. We only have to make sure we visit all its btm predecessors for verification as soon as we mark one. No btm position can turn into a lost one without one of its wtm successors changing from undecided to won. And if several of its successors would make that transition in one cycle, we will get it when the last one does (if the last one this cycle happened to be also the last one in total).

As we can generate the won wtm positions in any order, we might as well generate all those reached by retrograde white King moves first, and those by the Queen moves later. And while we are doing the pass with the Queen moves, as soon as we are done with a slice with fixed wK position, and thus have completed the N->(N+1) cycle for that slice, we might as well continue immediately (while the slice is in RAM) with the (N+1)->(N+2) cycle (Queen moves only) before we commit it to disk.

I am not sure I completely grasp the idea of the subset EGTBs. Currently I generate these simply together with the main one (as they are so much smaller this hardly involves any extra work), hiding them in the broken positions. But this might not be the smartest thing to do when generating on disk.

For bitbases you only need to distinguish the latest from all earlier cycles. So for btm you would have undecided, lost in the current cycle, and lost in the last cycle and lost earlier, which you could indicate in two bits. For the wtm positions there is no reason in my algorithm to distinguish won from newly won in the table, as you will operate on them immediately when you change them. So they require only a single bit. The 2-bit field basically has the same role as quickDTx: you add thelost positions of the current cycle, and when the cycle is completed it becomes the last cycle, and the former last cycle becomes a previous cycle, so you would have to relabel all those positions accordingly.

Codeman · Post by **Codeman** » Tue Apr 01, 2008 5:43 pm

h.g.muller wrote:I am not sure I completely grasp the idea of the subset EGTBs. Currently I generate these simply together with the main one (as they are so much smaller this hardly involves any extra work), hiding them in the broken positions. But this might not be the smartest thing to do when generating on disk.

Sorry, for the confusion. I ment slices of the matrix by subsets. In my case the matrix of all white pieces.

Codeman · Post by **Codeman** » Tue Apr 01, 2008 6:14 pm

Codeman wrote:I understand how you do the first pass, but how are you going to do the passes for the other white pieces? How are you going to do the moves with these, if you only have tables where they are fixed to certain squares?

h.g.muller wrote:As we can generate the won wtm positions in any order, we might as well generate all those reached by retrograde white King moves first, and those by the Queen moves later. And while we are doing the pass with the Queen moves, as soon as we are done with a slice with fixed wK position, and thus have completed the N->(N+1) cycle for that slice, we might as well continue immediately (while the slice is in RAM) with the (N+1)->(N+2) cycle (Queen moves only) before we commit it to disk.

Thats exactly the part I dont quite get, how do you calculate the retrograde white King moves, if they are all from different parts in the matrix.

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 6:38 pm

OK, I think I see what you mean now.

The problem is that if you expand the set to also include all positions a formerly fixed piece can come from, these pieces can also reach other positions from these extra position. So you would start loading things in duplicate.

The only exceptions I can think of are Rooks and Bishops: e.g. for a Rook you could make a slice that also contains all moves along ranks or files. So you would not fix the Rook, but only one of the coordinates of the Rook. (In other words, a Rook is a two-dimensional piece, which can be factorized into two independent 1-dimenensional pieces.)

But there is little to gain, unless such factorization of pieces is essential to make the working set fit into fast storage. E.g. when doing a (3+2)-men P-slice, you cannot fit 2+2 (16M positions) in L2, and you would need to load L2 3 times per cycle with a 1+3 slice (once for each white piece). But if white has a Rook, you could see it as two half-pieces, and do twice 1.5+2 (requiring 8x64x64x64 = 2MB).

With Kings you could make a partial factorization if you can fit two ranks/files of King positions in fast storage: you would start loading everything with King in a- and b-file, then only treat all moves that step within that set. Then you are done with the a-file, and write it to slow storage, to make space for loading the c-file. Then you can treat all white King moves between b and c, and within c, after wich the c-file is done. Advancing two cycles in one pass would be more difficult. (You would need 3 board-files worth of data, as a King in c-file can still cause lost positions two moves eatlier with a King in the a-file. So you would need to store 10-board-squares worth of data.

h.g.muller · Post by **h.g.muller** » Tue Apr 01, 2008 6:49 pm

Codeman wrote:Thats exactly the part I dont quite get, how do you calculate the retrograde white King moves, if they are all from different parts in the matrix.

You slice the matrix by row in stead of by column. Then they are from the same part of the matrix again.

So when transferring a slice between disk and RAM, it is not a completely sequential access, but a gather/scatter operation. But a disk is a random-access medium, provided the chucks (which all are accessed sequentially) are big enough. For a (3+3)-men build you would have (1+3)-men slices as chunks (with the 3rd white piece dynamic, so 16M positions), and you would load selected sets of 64 such chunks to gather the (2+3)-men slices that make either the first or second white piece dynamic. And then you would treat all the moves of that white piece, and defer the moves of the other white piece to the pass where you gathered the chunks in a different way. (The third white piece, which is already dynamic within a chunk, can be moved in either pass).

Post by **Shaun** » Wed Apr 02, 2008 5:22 pm

Kirill Kryukov wrote:Here is the link to my new site: http://kirill-kryukov.com/chess/egtb/files.html

Hi Kirill,

There seems to be a good connection from your provider - I can get 2.5MB/Sec from work - although this was a one off test on a 200MB file just after 6pm UK hours...

1.2TB might get me looking for a new job....

Shaun

Post by **Shaun** » Wed Apr 02, 2008 5:25 pm

PS - I am not going to try this at home - you will laugh and I will cry. I am also confident of several suggestions to look at generation

Dhanish · Post by **Dhanish** » Thu Apr 03, 2008 6:12 am

How about using the Bit Torrent protocol?

I find many large files shared on the net by this method. The load is shared by all.
See: http://en.wikipedia.org/wiki/BitTorrent_%28protocol%29.

Codeman · Post by **Codeman** » Fri Apr 04, 2008 6:22 pm

h.g.muller wrote:...
The only exceptions I can think of are Rooks and Bishops: e.g. for a Rook you could make a slice that also contains all moves along ranks or files. So you would not fix the Rook, but only one of the coordinates of the Rook. (In other words, a Rook is a two-dimensional piece, which can be factorized into two independent 1-dimenensional pieces.)

But there is little to gain, unless such factorization of pieces is essential to make the working set fit into fast storage. E.g. when doing a (3+2)-men P-slice, you cannot fit 2+2 (16M positions) in L2, and you would need to load L2 3 times per cycle with a 1+3 slice (once for each white piece). But if white has a Rook, you could see it as two half-pieces, and do twice 1.5+2 (requiring 8x64x64x64 = 2MB).

With Kings you could make a partial factorization if you can fit two ranks/files of King positions in fast storage: you would start loading everything with King in a- and b-file, then only treat all moves that step within that set. Then you are done with the a-file, and write it to slow storage, to make space for loading the c-file. Then you can treat all white King moves between b and c, and within c, after wich the c-file is done. Advancing two cycles in one pass would be more difficult. (You would need 3 board-files worth of data, as a King in c-file can still cause lost positions two moves eatlier with a King in the a-file. So you would need to store 10-board-squares worth of data.

Thanks for your comments!

Am I right that if you cut the rook into two 1 dimensional pieces you would have to do 2 phases for this one piece? And if so, would that additional time be compensated by the cache space saved?

Now to the implementation for the Kings: tell me please, would there be a way of doing calculations in the same time as the data is being loaded from the HD? This would be perfect as we then wouldn't have to wait for all 64 squares of the white-king to finish loading, but start calculating in the meantime as soon as the first 4 squares are ready.

Secondly, thanks for your patience, I think as last I understand the principle of your described algorithm.

Then, what ways are there to treat subendgames, especially those, where black can capture a white piece or promote a black pawn. I know of your current solution of having all subendgames in the same file, but have you got any alternative ideas. The ones where white captures or promotes are easier to handle anyway.

And finally would it be alright for you, if I tried to implement your way of using the HD - RAM combination for generating TBs in my Bitbase generator? Till now everything worked well when generating files in ram, but now I have to find a solution to expand the capabilities to men>=6.

h.g.muller · Post by **h.g.muller** » Fri Apr 04, 2008 9:59 pm

To start with your last question: of course you can use anything that is sais here as you see fit. This is the reason I post it in public.

If you treat the Rook as two pieces, you would have to do separate passes to treat it. This seems bad. But you can treat multiple white pieces in one pass, only limited by the amount of fast storage you have available. This trick only makes sense if you had storage avalable for a non-integer number of white pieces, say 1.5. Then, if you would have 3 white pieces in total, you would need 3 passes if you do them one at the time. But if you could split a Rook you could do half the Rook together with one pieces, and the other half with the other piece. So in stead of needing one pass for the Rook, by splitting it you don't need two, but zero passes, as there was room to accomodate the half-Rooks in existing passes.

If you have enough memory available, it should indeed be possible to hide the calculation time under the disk access time. You would then need twice the amount of RAM. One where your CPU is doing the cycle (N -> N+1), and one from where you are first writing the N+1 data of the previously treated constellation of static white pieces and, after this is done, will read in the N data for the constellation of white pieces to be treated next. By the time you are done with that, the two buffers swap roles.

For treating a white King, in stead of loading the full slice (all 64 King positions) you only have to hold a partial slice: After you have loaded a1-a8, b1, b2, and treated all subtrees from those positions starting with white King moves between them, you will no longer need the positions with Ka1: all sub-trees that started there and all trees that ended there have been walked. This is only true, however, if you would do a single cycle with this pass (which is what you have to do if it is neither the first nor the last pass you do for that cycle). If you want to advance two cycles in a single pass (to combine the last pass of one cycle with the first pass of the next), you need to keep the positions with Ka1 in memory until after you have loaded and processed a1-a8, b1-b8, c1, c2, c3. After Kc3 is in memory you can finish the N->N+1 cycle for a1-a8, b1, b2 (but not for b3, as it is reachable from c4, which you don't have yet). Then, the trees you build from a1, a2, b1, b2 for the N+1->N+2 cycle would cover all of a1, and after you walked them, a1 is ready, (as far as King moves are concerned), and can be written to disk.

So the scheme would be: if a1=0, a2=1, ..., b1=8, ... when you read in the (sub-)slice with white King on square S, generate all trees with King moves from DTx=N positions this square to squares S, S-1, S-7, S-8, S-9, and the reverse moves (from S-1 to S from DTx=N positions with wK on S-1, etc). After that, generate all trees from DTx=N+1 positions with wK on S-9 to S-9, S-10, S-16, S-17, S-18, and their reverse moves. Then you are done with S-18, write it out, read in S+1, and repeat the operation from there. All this in so far the mentioned squares exist, and in so far the mves wouldn't hit the board edge.

Sub end-games with a captured piece are much smaller, and you would need them to 'seed' won positions with wtm, and non-lost positions with btm. If you are building DTC or DTZ, all this seeding takes place at the first cycle (and you would only need a bitbase of the sub -end-game). If you are bulding DTM you would have to do it every cycle. For capture conversions the successor TBs are 64x smaller, so this is not a significant overhead. If the 'conversions' are Pawn moves to a successor P-slice, the successor slice is as big as the current one. But the P-slices are comparatively small, and you can hold the successor (as well as the slice itseld) in RAM entirely. If the conversion is a promotion, you have the biggest problem, as the successor end-game is now far bigger than the P-slice (64 times, if there are still other Pawns on the board). But you only need the positions with Q on 8th rank from this successor. It would probably pay to make separate table-bases of such promotion positions (also listing where under-promotions are needed, and which ones). This is the hardest problem of the entire enterprice. With DTZ you fortunately only need the info as a bitbase.

CCRL Discussion Board

EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan

Re: EGTB sharing - new plan