Mapping the 7-men computation

astokes · Post by **astokes** » Wed Sep 21, 2011 8:52 pm

Hello everyone!

This is my first post on CCRL after exchanging a couple of private emails with Kirill, who encouraged me to post here.

Long ago I came up with an unorthodox approach to data compression. My approach--which I have yet to prove viable for objects bigger than 1kB--requires extraordinarily high computational resources at compression time, but yields performance in time/space under decompression that no other method achieves. Proving the validity for objects of useful size is a fair ways off. I'm not saying much about it at this time. It's come back onto my radar screen recently because advances in GPU computing are tilting the landscape toward viability some day soon.

In the meantime, I'm building a foundation for the rest of the project. One aspect is to have a range of data sets for testing. The data set needs to scale over a wide range of sizes so I can start small before gearing up to confront the wheelhouse; the individual data elements need to be smallish, on the order of bits (or fractions of bits) per query; the retrieval performance demands need to be intense in both latency and bandwidth to best demonstrate my advantages; and, finally, the data set needs to be immutable for the long haul, because the up-front cost of compression is not something you'd want to repeat very often. It strikes me that the EGTB space offers all of these attributes, as well as a community of smart people who aren't easily deterred by a long-running loop.

My own skillset runs primarily toward R and C++ and, someday soon, OpenCL. I mostly use R as the base application, calling out to C++ for high performance fragments using the R packages Rcpp/inline/sugar, which play very nicely with STL and template libraries accessing R data structures directly. I'm tremendously grateful to Dirk Eddelbuettel and Romain Francois for their recent labours to make this possible. My favorite glue package ever!

I don't do nearly as much C++ as I once did, but I can navigate my way around a template library without too much difficulty modulo decent error messages. This too, coming to a theatre near you, courtesy of Clang/LLVM from Chris Lattner and many others. Just this week I installed a Clang/LLVM plug-in to Eclipse Indigo, but on my system it wasn't quite ready for prime time, but this is an "any day now" checklist item.

What I expressed to Kirill, and which he seems to endorse, is that I would like to begin by mapping the 7-men computational space, under various assumptions about hardware fabric available. We would both like to see the community pulling together. Kirill expressed the view that a heavily resourced PC (large memory, with several SSD storage devices) will soon be viable. I've been leaning more toward cloud commodity, e.g. 64 GPU nodes leased from Amazon for a few marathon hours. We both seem to agree that the demands of the intermediate storage is one of the main impediments. It doesn't much matter that you can perform 64 trillion operations per second if you must first upload 30 terabytes of seed data. I'm just beginning to think through these complications. (Yes, I've read a few old papers on pathological data locality in the EGTB problem space; but I wasn't convinced it decided matters. One of my goals is to model data flow requirements under different working set partition assumptions. I don't know how far this would take me into graph theory, but I suspect The Boost Graph Library is up to the task, if need demands.)

Actually, undigressing, I had a bit of trouble viewing EGTB as having pathological locality. In the seven man problem, you only change the coordinate of one piece out of seven per unmove (modulo hopscotch) and you have complete control over your visitation path. I implemented Knuth's Chase algorithm not long ago to crack my rusty knuckles in combinatorics. This is way to visit every node in a combinatorial space changing only one index per visitation. After my warm-up exercise I found FXT: a library of algorithms and Matters Computational from Jörg Arndt which strikes me as an excellent resource for die-hard bit-bangers. The FXT library also implements the Chase algorithm, but I think he does it with bitboards, whereas I used mailboxes. Surely, on first glance, finding an optimal parallel partition is a big challenge, but not insurmountable. Perhaps I'll be humbled soon.

As far as this thread is concerned, my objective is to develop some simple stuff to map out the computation and make my work available as an R package. I've already interfaced from R to Gaviota's tbtools probing library. For forward move generation, kcchess seems to be fast enough and easy to work with. Kirill pointed me also toward Brightchess and Scorpio.

My first visual exploration of the computation began with some 6-men statistics I first obtained from a link on this forum to 6Ftypes. Later I found more complete statistics at Kirill's site, but I haven't subsequently updated my graphic. It's not intended to be the best possible chess graphic, and I was constrained by ggplot2 in an unexpected way (geom_size was applying to multiple layers in parallel, so I changed one layer to the vertical white line). It was a start and I found it interesting. If the graphic perplexes people, it's probably simpler to post my code than write a lot of text. Many variations are possible. Any suggestions?

My approach is less about driving directly to the goal by the most immediate viable path, and more about taking in the scenery before making the big commitment. Collaboration will be welcome once I'm able to provide a viable starting point.

Looking forward to many future discussions.

Allan

astokes · Post by **astokes** » Wed Sep 21, 2011 8:56 pm

By Gaviota engine I meant tbtools for EGTB probing. UPDATE: Found the "edit" button. I'm not used to that. Previous post amended. I spend a fair amount of time on sites where the rules of engagement are fire and forget.

guyhaw · Post by **guyhaw** » Wed Sep 21, 2011 10:01 pm

Plenty of sub-7-man statistics at http://ticc.uvt.nl/icga/games/chess/endgames.php Your chart is quite amusing but I'd be tempted to go to a log-scale on both axes.

There are clever ways to index permutations to meet certain requirements - but these do not involve a 'move' relationship between the first and second permutations.

There have been various approaches to indexing chess endgame spaces. Nalimov used chess-specific considerations to make the space smaller than anyone else. Johan de Koenig used something clever in FEN (interlaced parts of the piece-indexes) which I don't understand.

A more recent idea is to use one index to generate the EGT and another to store it. Even the latter should be done with a consideration of what the EGT will be used for. I suspect this has been done in the case of the Konoval/Bourzutschky 7-man EGTs but MB is much better qualified than me to say quite what was done.

If multiple nodes are to be used, within a machine or across 'the net', considerable cleverness has to be exercised to make sure that communication is not the bottleneck. It is far better to parallelise by generating DTZ EGTs for positions with fixed-Pawns than by trying to parallelise 'on the net' when generating an EGT for a fixed force.

I'm sure it is also better to generate WDL EGTs and use this information before creating any kind of DTx EGT. I'm sure I've said this more than once and no-one has corrected me yet.

g

astokes · Post by **astokes** » Wed Sep 21, 2011 11:37 pm

Guy,

That was a quick response. I've seen many of your previous posts here in passing. It's good to hear from you.

I presume that generating WDL tables is expedited over full DTx tables due to less memory pressure in space. I'm not sure it buys you much in time at the memory manipulation layer, since most architectures these days shuffle full cache lines regardless of little you update.

Does the working boundary set have a proper name? I'm thinking of the set not yet resolved which reaches the resolved set in at least one single move. If I'm right in my thinking, this constitutes the set one must thoroughly traverse with each retrograde iteration to achieve correctness, at least on the "you can't make me" side of the accordion graph. My understanding is that the resolved set consists of won states, and that draws are dregs left over when the algorithm terminates.

Also, I think I read that in Thompson's tables, only W for white was marked; but I guess the more recent practice is to mark W for both white and black at the basis step, if their piece compositions aren't interchangeable (or if they are, and you don't care about precise counting).

Once you have WDL tables, in the iterative step of full DTx generation, you would be able to filter out known draws. This would spare you some move generation (not too important with GPU resources available) as well as a fair amount of useless forward probing (which could be very important in a parallel implementation).

In the parallel implementation, each node needs a copy of the WDL seed table for the portion it is responsible to enumerate (not necessarily a subset if you view computation as frivolously cheap compared to data interchange), and there is nothing extra that needs to be exchanged on the fly. Nice.

If I'm on the right track, I can't see any grounds to argue with you. Even if you adopt an eccentric enumeration order, the WDL table could be presorted for mostly linear access so that it doesn't compete with big-ticket memory thrashing.

Perhaps one small carp: some piece combinations are light on draws.

Concerning my plot, it is effectively a log scale though the use of the Shannon metric for DTx. I was looking at the information content from an arithmetic coding perspective. On my plot, the x and y axes are both bits-per-position information measures for whichever DTx those files provide (not documented) over the distance metric.

KRB-KNN in the bottom right corner has DTx values up to 237 IIRC. The average distance ("perplexity" from the speech recognition community as Shannon views it using sum -p log p) is roughly 128, leading to an x axis value of log2(128).

I plotted "winnable" using the logit transform, which better illustrates extremes, since for me this was intended as a map of logical and pathological prospective test cases for my compression work. Logit is one of many transforms built into ggplot2. It's fundamentally logarithmic, but double-ended. Win ratio is marked "(decisive)" since it ignores draws.

Logit

Only the white line is non-logarithmic, but I found the cube root maximized conveyed discernment.

Allan

ADDENDUM_1
Correct me if I'm wrong, but I have a terrible fear that the boundary set for many piece combinations quickly grows to encompass nearly the entirety of the unsolved positions. That is effectively what I came back to ask before seeing your response.

ADDENDUM_2
These things are tricky. You have to do the forward probing in the WDL generation, but perhaps with less memory footprint, you can partition the problem less aggressively, and perhaps that's a win on interchange latency. If your rate limiting step is memory read/write cycles regardless of footprint, it's not buying you much else, or is it?

astokes · Post by **astokes** » Thu Sep 22, 2011 12:54 am

Guy,

I downloaded the spreadsheet set you referenced. The definitions on the first sheet will be useful to me starting out.

There's a lot of data there, but it doesn't look like an easy data set to visualize. On my own map, it could serve as an annotation that certain kinds of specialized test cases are available against a piece combination (it seems you use the word "force"). Are there any geometric relationships to call out in an overview map or is the space already well served by textual or other directed search?

Allan

Post by **Kirill Kryukov** » Thu Sep 22, 2011 1:22 am

I'd like to comment on using WDL for computing the DTx tables. There can be two such uses:

1. When computing a DTx table, WDL data for the same table can save time, as it allows to mark draws so they can be skipped. Also when looking for won positions, those that are known to be lost can be skipped (and vice versa). I guess this is what Guy was mentioning, although I can't recall seeing any description. I probably missed it, Guy, could you please share a link, if you talked about it somewhere in detail?

2. When builging a DTx table (other than DTM, which leaves DTX, DTC and DTZ), we can use WDL sub-endgame tables for some sub-endgames. Since WDL is much more compact than DTx, this results in much smaller traffic between the nodes. Also each node will be able to store larger number of WDL tables (compared to DTx tables), so that requests for a missing table will be less frequent. Another advantage of this method is that some endgames (like pawnless ones) can be skipped completely: When complete 7-man WDL is available, we can start building KPPPKPP in DTZ (or DTC) without any other DTZ tables. Finally, one more benefitial side-effect of this is localization of errors. With DTM each table has to depend on DTM sub-endgame tables, so any error in sub-endgame table will propagate. With WDL sub-endgame tables, however, any errors in DTx will remain there and not affect the other tables. This should allow to relax the verification requirements somewhat (e.g., to postpone the verification, because any error can be corrected locally without having to recompute a large number of dependant tables). This idea, as far as I understand, was not a part of Guy's proposal. Although the idea is trivial, I never saw it mentioned by others until I formulated it for the 7-piece EGTB Bounty. (I'm sure there should be earlier references that I'm missing).

astokes · Post by **astokes** » Thu Sep 22, 2011 4:01 am

Kirill, your points on independence and accuracy are good ones, though I would have guessed DTC were similarly independent. Perhaps I don't fully grasp the metrics yet. The chess glossary defines DTC as "any conversion of material" which may be "either a promotion or capture". Doesn't this give you a constant-value boundary (zero) before leaping into a subordinate table?

I guess DTC is an odd duck, because it doesn't have an advantageous size, nor does it express jugular efficiency (though it will steer you there eventually). DTC would be the algorithmic view of things, DTM the chess purist perspective, and DTZ the "blessed by committee" perspective. The WDL tables we are contemplating are actually WDL-unbounded and are not the same as WLD derived from DTZ-50, so these don't please everyone, either, despite being seemingly semantic free.

Still, it could shake out if memory update cycles are the rate limiting algorithmic step informing your partition strategy that you have enough memory for DTC as ideally partitioned rather than sufficing with WDL; if your node has enough RAM, it won't be short on disk, either, unless terribly eccentric. Interchange data flow could end up the same if you end up exchanging bit-delta sets (every node knows the current pass number). DTC tables have the advantage of elucidating some line of respectable play and WDL can be spun off as a subcase. That said, I think odds are against having that much extra memory that you would skip WDL as the first pass, even in a commodity cloud, unless partitioning was practically free, which no-one expects in the general case.

Returning to independence, wouldn't DTZ have even more constant boundary states, also including any change in pawn structure? DTZ for piece combinations (forces) heavy in pawns might actually have a high partition potential. Whatever the boundary, you need WDL when you get there, or some superset, so you could be back to pure WDL tables at the boundaries even on a special case where you commence with DTC.

Of the metrics I'm familiar with, it seems that only DTM has strong carry-over and mad (idiomatic for "rampant") error percolation.

I guess WDL-unbounded would grant some additional computational freedom in treating the boundary set as dynamic rather than iterative, since you aren't labeling anything by computational pass. I don't think this helps much with partitioning, since it's unlikely you can find semi-closed regions to partition over (partitions where WDL can iterate independently and productively until the nodes reach agreement that it would expedite to interchange). I speculate that progress would come to a halt so quickly it wouldn't be worth bothering to try a second independent iteration prior to interchange. For WDL you can be fairly sloppy with completeness of intermediate iteration so long as the final termination state is accurately detected, but exploiting that observation could devolve into a fool's errand.

In my mental algorithm, my view was that the subtables are required only during the basis step. Are there other approaches where this assumption is untrue?

Sorry if I'm thinking out loud, but this is a lot to get my mind around in the initial uptake.

I think I've detected the unstated assumption: the KPPP-KPP style piece combinations can't be generated with full chess accuracy without first generating myriads of promotion tables, most of which won't permit this kind of dataflow simplification, and if you've done the hard part already, why waste time coding up a quick conclusion for just one piece set? (My dirty secret, which I mentioned to Kirill in personal mail, is that I'm willing to tread somewhat lightly on full chess accuracy, so I never assume I'll bother will all those promotion tables before computing a piece set of interest.)

From my ramblings I've arrived at the view--in agreement with Guy and Kirill as I've understood them--that if you're committed to full chess accuracy, WDL tables are almost certainly the right place to commence.

For a slightly tainted KPPP-KPP analysis, there might perhaps be other viable starting points. This line of attack was mentioned in our private exchange. Kirill suggested that maybe some taint-testing could be performed on reduced size chess boards (comparing full accuracy to tainted accuracy). You wouldn't get formal bounds for the full-sized chess board, but it could prove suggestive of whether the taint is bearable. I have a notion of how to dampen (but not eliminate) the taint for KPPP-KPP, but that will have to wait for another time.

Allan

Forum nit: "Log me in automatically" is not having much effect on my Firefox of many paranoid extensions. NoScript is the usual culprit, but k-k.com shows as completely green listed. It's disorienting, since after I log in, if I use the back button to return to the original resource (as is my habit), my login does not take effect (I tried a page refresh as well with no success).

Post by **Kirill Kryukov** » Thu Sep 22, 2011 6:13 am

astokes wrote:Kirill, your points on independence and accuracy are good ones, though I would have guessed DTC were similarly independent. Perhaps I don't fully grasp the metrics yet. The chess glossary defines DTC as "any conversion of material" which may be "either a promotion or capture". Doesn't this give you a constant-value boundary (zero) before leaping into a subordinate table?

Yes, DTC and DTX (distance to capture) can also utilize WDL sub-tables. In fact this is exactly what I do in 4x4 chess - use WDL subtables whenever possible.

astokes wrote:I guess DTC is an odd duck, because it doesn't have an advantageous size, nor does it express jugular efficiency (though it will steer you there eventually). DTC would be the algorithmic view of things, DTM the chess purist perspective, and DTZ the "blessed by committee" perspective. The WDL tables we are contemplating are actually WDL-unbounded and are not the same as WLD derived from DTZ-50, so these don't please everyone, either, despite being seemingly semantic free.

Yeah WDL and WDL50 are unfortunately different and have to be computed independently. It seems there should be two sub-projects - the one which takes 50 moves rule into account, and the one which does not. In 4x4 chess I'm not using any "draw after N moves" rule currently.

astokes wrote:I think I've detected the unstated assumption: the KPPP-KPP style piece combinations can't be generated with full chess accuracy without first generating myriads of promotion tables, most of which won't permit this kind of dataflow simplification, and if you've done the hard part already, why waste time coding up a quick conclusion for just one piece set? (My dirty secret, which I mentioned to Kirill in personal mail, is that I'm willing to tread somewhat lightly on full chess accuracy, so I never assume I'll bother will all those promotion tables before computing a piece set of interest.)

It's possible to have chess accuracy and still be able to skip pawnless (or any other uninteresting) tables. Here is one such idea described in detail (applied to checkers):

Yngvi Bjornsson, Jonathan Schaeffer and Nathan Sturtevent (2005) "Imperfect Information Endgame Databases", Advances in Computer Games 11, Lecture Notes in Computing Science #4250, pp. 11-22. (PDF available here)

astokes wrote:Forum nit: "Log me in automatically" is not having much effect on my Firefox of many paranoid extensions. NoScript is the usual culprit, but k-k.com shows as completely green listed. It's disorienting, since after I log in, if I use the back button to return to the original resource (as is my habit), my login does not take effect (I tried a page refresh as well with no success).

You can let Firefox remember the password for you, then logging in becomes just one click with no typing. I never use "Log me in automatically", so not sure what could be the problem.

guyhaw · Post by **guyhaw** » Thu Sep 22, 2011 7:03 am

Kirill Kryukov wrote: With WDL sub-endgame tables, however, any errors in DTx will remain there and not affect the other tables. This should allow to relax the verification requirements somewhat (e.g., to postpone the verification, because any error can be corrected locally without having to recompute a large number of dependant tables). This idea, as far as I understand, was not a part of Guy's proposal. Although the idea is trivial, I never saw it mentioned by others until I formulated it for the 7-piece EGTB Bounty. (I'm sure there should be earlier references that I'm missing).

Ouch! This was certainly not one of my ideas because it is wrong, and there's a painful precedent for this. JdK's coding for FEG incorrectly did not pick up wins in KNNKP which are 'captures in 1 to a KNN mate'. I think I have that right. It is as if he had written off the whole of the KNN endgame as a draw which it is not. This 'WDL error' - a 'value error' not of course a 'depth error' - polluted the FEG KNNKP EGT and all points 'North'. I remember MB seeming less than happy about it at the time as it caused him to calculate several faulty 6-man FEG EGTs.

Otherwise, KK makes some good points about further uses of WDL EGTs.

g

Post by **Kirill Kryukov** » Thu Sep 22, 2011 7:19 am

What you call "wrong" is not what I said. My use case for WDL is: You start with correct WDL tables. Then you compute a DTZ (for instance) table. This DTZ table won't be used when building any other DTZ tables, so any errors in this DTZ tables won't pollute any other tables. This was my point. It works out very similar with DTC too. Of course a WDL error would pollute the other tables, both WDL and DTx. That's why WDL tables have to be verified before they are used to build DTx tables.

guyhaw · Post by **guyhaw** » Thu Sep 22, 2011 7:26 am

I've used 'WDL' and 'WDL EGT' consistently in this forum so if you can text-search the whole thing on those tags, you'll find what I've previously written about the utility of WDL. I think the SHREDDER team have written about the utility of their SHREDDER bitbases elsewhere.

While you can compute KPPPKPP by WDL'ing everything else first, I'd be nervous about that. Any errors in the WDL phase could well be totally invisible, and certainly relatively invisible to having DTx EGTs for prior endgames.

I haven't got enough time to work out entirely what 'astokes' is getting at: there are a lot of ideas in play and they're often a bit tangled up. However, I would discourage 'labelling' and being judgemental about the various metrics DTC, DTM, DTZ and DTZ50. All assume that White and Black are minimaxing to the same DTx metric, and all have their uses. Ken Thompson computed to DTC, occasionally to DTM if interested, and sometimes in the early days to DTZ (for KQPKQ for example).

DTC EGTs are more compact after compression than DTZ EGTs and DTZ EGTs more compact than DTC EGTs. This is simply because the range of depths involved is less. The figures are in a '6-man Strategy' paper of some time ago.

Courtesy of UoR and ICGA, my past 'ICGA' writings are available at http://centaur.reading.ac.uk/view/creat ... fault.html .

Where there are Pawns on the board, the EGT may be computed fixed-pawn slice by fixed-pawn slice, as long as the slices are computed in the right order ... K(a7)K before K(a6)K etc. This also allows for parallelism: K(a7)K can be done in parallel with K(a6)K. There's some fancy combinatorial work around enumerating the positions of sets of like men (in this case Pawns) which I cannot hold in my head but have written up in the Nalimov/Haworth/Heinz papers about Nalimov's EGTs.

However, for someone starting out on creating EGTs (and that includes me), MB's advice to 'keep it simple' has to be good. Certainly, creating correct, verified EGTs is much harder than it looks.

g

guyhaw · Post by **guyhaw** » Thu Sep 22, 2011 7:33 am

While we are talking about boundaries ...

Most people assume that the winner does the conversion into the next endgame. This is not always true: the loser can convert a Pawn, or be forced to capture. N Wirth, back in the 1990s, inherited a piece of code that assumed 'winner does all the conversions' and as a result his otherwise more accurate 'depths in plies' were unpredictably less accurate at times.

J Schaeffer wrote a piece about managing bugs in the process of creating Checkers (WDL) EGTs ... ICGA conference I think. One of them was 'just verifying the internal consistency of an endgame EGT, as opposed to checking its consistency with successor EGTs as well'.

g

guyhaw · Post by **guyhaw** » Thu Sep 22, 2011 7:41 am

Ok, KK, I see what you are saying ... now.

But it's a subtle point and I misread what you were saying, and maybe others will too.

So maybe my clarification is not wasted. As you say, if your WDLs are verified as correcthould be), you can use them as a launchpad to create whatever DTC or DTZ (but not DTM) EGT you wish.

g

Post by **Kirill Kryukov** » Mon Sep 26, 2011 7:11 am

guyhaw wrote:While we are talking about boundaries ...

Most people assume that the winner does the conversion into the next endgame. This is not always true: the loser can convert a Pawn, or be forced to capture. N Wirth, back in the 1990s, inherited a piece of code that assumed 'winner does all the conversions' and as a result his otherwise more accurate 'depths in plies' were unpredictably less accurate at times.

Yes, a conversion can be forced. The problem when transitioning from DTM to DTC is that in DTM the win is always odd number of plies, while the loss is always even number. In DTC this is no longer the case, so by just looking at the number or plies it's impossible to say the position is won or lost. You'd either have to supplement the DTC (in plies) with win/loss bit, or give up storing the actual number of plies so that you can utilize the odd/even bit again. I do the latter, so in my tables the DTC is not necessarily accurate, but can be off by 1 ply.

guyhaw wrote:J Schaeffer wrote a piece about managing bugs in the process of creating Checkers (WDL) EGTs ... ICGA conference I think. One of them was 'just verifying the internal consistency of an endgame EGT, as opposed to checking its consistency with successor EGTs as well'.

Why would you want to skip checking the consistensy with successor endgame tables? Is it so prohibitevely time consuming? If you skip the boundary verification, then such tables should not be called "verified" but "partially verified". I always do complete verification, otherwise why bother at all? Although I have to say I haven't seen even one hardware-induced error so far.

guyhaw · Post by **guyhaw** » Mon Sep 26, 2011 3:51 pm

The encoding of depth need not be aligned with the sign-bit of a notional integer of say 8 bits.

It needs to cover 'draw', 'index unused' (aka 'broken' in Nalimov terminology), lost in 0-n1 and won in 1-n2. If (n1 + n2 + 3) > 64, Nalimov used a 16-bit integer. The choice of 8 bits or 16 was made by Nalimov before the EGT was created.

This occasionally caused some rework when 8 had been chosen when 16 was needed, and even when 16 was chosen and 8 bits was sufficient. Of course, in the DTC and DTZ metrics, the n1 and n2 were smaller which helps with the compression of the EGT file.

Jonathan Schaeffer's article on bugs and testing in the generation of EGTs was:
- - - - - Schaeffer, J., Björnsson, Y., Burch, N., Lake, R., Lu, P., Sutphen, S.: Building the Checkers 10-piece Endgame Databases.
- - - - - In: Advances in Computer Games 10, pp. 193-210 (2003)

I have no idea why they went through a round of internal testing without boundary-testing on EGTs. Maybe JS is making the simple point that internal testing does not pick up the endgame-boundary problems ... and there were problems to be picked up. Certainly, JdK with FEG initially ignored the possibility of capturing directly into a KNNK mate.

My article on 'Data Assurance in Opaque Computations' with Joe Hurd is at http://centaur.reading.ac.uk/4515/ .

g

astokes · Post by **astokes** » Mon Sep 26, 2011 8:09 pm

KPPP-KPP pawn slices

I wish I had more time for this discussion. I found an hour over the weekend to compute the pawn slices of KPPP-KPP. R has a "partitions" package to compute the seven partitions of 5, so I don't have to (five pawn stacked in one file, down to all five pawns in distinct files).

Partitions of 5
7 0 0 0 0 1
6 1 0 0 1
6 0 1 1
5 2 0 1
5 1 2
4 3 1
3 5

Rows sum to eight. Rows multiplied by column index (starting with 0) also sum to eight, i.e. the first column is the number of files with 0 pawns, then the number of files with a single pawn, etc. Then I did the number of arrangements of files thus populated.
8 56 56 168 168 280 56

Multiplying by choose(5,3) colourings:
80 560 560 1680 1680 2800 560

This sums to 7920 distinct pawn slices. Now for verification, multiplying by number of possible positions within each file:
480 50400 168000 1209600 2268000 9072000 4354560

This last vector sums to choose(48,3)*choose(45,2)=17123040 so my code appears to work for at least this case.

These number need to be multiplied by the circa 1800 KK permutations for endgames with pawns to determine storage requirements per slice.

Number of legal moves for white varies from 1 to 20 (three pawn with four promotion options each, plus eight king moves). A close second is 19 legal moves: the same king moves plus three pawns with single or double forward moves as well as three captures (all pawn captures, since white to capture black king is a no-no here). Typical is probably around 10 or 11.

Multiplying 17 million by 1800 yields 30 billion total positions over roughly 8000 slices. Coding dominant move instead of DTx (as I am leaning toward with my compression method), probably 2.5 bits per table entry (on a fairly simple move order heuristic), times two sides to move. In the 20GB range. I think I calculated that the larger slices would occupy about 16MB making them quite acceptable to page in on the fly from SSD. If I'm thinking the GPU handles this, the in-memory cache size would be in the 2-4GB range for reachable slices (which could include other pawn-rich endgames, too, but the slices get much bigger).

For what I'm setting out to achieve with the other half of my project, these are all practical numbers. Except for the 60TB of chess accuracy which precedes this analysis, should it give me pause.

HDF5

Finally, I took a very quick look at HDF5, which I evaluated a couple of years ago for a project with large storage demands. It was very fast in the tests conducted, even from a rather minimal R binding. Fast, portable, infinitely scalable, accommodates metadata, has a rich tool ecosystem, and bindings for everything. Aimed at petabyte data sets, and well financed for future maintenance. Downside is an extremely rich API, so the tiny part of value to this project is tiny needles in a giant documentation haystack--if one takes the view that too much documentation is a bad thing.

HDF-EOS Project

It even seems to support rudimentary packed bit fields. I'll do some performance tests on that soon.

Anyone else have experience with this? I prefer to work with standard file types. Functionally I don't think it offers much over roll-your-own for EGTB tables. However, I would like to bias further collaboration to embed comprehensive combinatorial metadata (position counts) in the giant files (for validation and comparison), and a standard format might facilitate this.

Back soon

I'm looking forward to responding to the other posts as time permits.

guyhaw · Post by **guyhaw** » Mon Sep 26, 2011 9:09 pm

I don't understand your 'Partitions of 5' table at all ... and I'm not sure why you are doing it.

A 'Pawn slice' is a subset of an endgame with all the Pawns on fixed squares. Historically, the word 'tablebase' which most (but not me) use instead of 'table' (i.e., 'EGTB' rather than 'EGT') was coined to refer to what I'm calling a 'Pawn slice' here.

To enumerate these slices, enumerate triples from 64 squares, and pairs from 64 squares. A P-slice pairs off with its reflection in the c/d axis, so a position should be flipped left-right if that results in a smaller P-slice index. (That requires some subtlety if you are planning to include positions with castling rights).

So in the case of KPPPKPP, each slice contains just the 1800+ positions for the Kings (in fact, a lot less of course as the Kings do not have first choice of the squares.

The DTM EGTs seems to compress by a factor of ~4, and the DTZ EGTs by a factor of ~8.

g

Post by **Kirill Kryukov** » Mon Sep 26, 2011 11:08 pm

guyhaw wrote:The encoding of depth need not be aligned with the sign-bit of a notional integer of say 8 bits.

It needs to cover 'draw', 'index unused' (aka 'broken' in Nalimov terminology), lost in 0-n1 and won in 1-n2. If (n1 + n2 + 3) > 64, Nalimov used a 16-bit integer. The choice of 8 bits or 16 was made by Nalimov before the EGT was created.

This occasionally caused some rework when 8 had been chosen when 16 was needed, and even when 16 was chosen and 8 bits was sufficient.

If your n1 and n2 are in moves, then that's what we normally use with DTM. With DTX/DTC/DTZ it's no longer accurate, so you'd have to express n1 and n2 in plies, which doubles the space of used values and makes your data less compressible. So even in DTX/DTC/DTZ I use n1 and n2 in moves. This of course makes the values ambiguous somewhat, as "loss or conversion in 1 move" DTC may mean "loss or conversion in 1 ply" as well as "loss or conversion in 2 plies". I don't separate these two cases, but store them with the same value. I believe it's acceptable and practically useful simplification, although I did not measure the resulting gain in compactness of the compressed data.

guyhaw wrote:Of course, in the DTC and DTZ metrics, the n1 and n2 were smaller which helps with the compression of the EGT file.

Of course.

guyhaw wrote:Jonathan Schaeffer's article on bugs and testing in the generation of EGTs was:
- - - - - Schaeffer, J., Björnsson, Y., Burch, N., Lake, R., Lu, P., Sutphen, S.: Building the Checkers 10-piece Endgame Databases.
- - - - - In: Advances in Computer Games 10, pp. 193-210 (2003)

I have no idea why they went through a round of internal testing without boundary-testing on EGTs. Maybe JS is making the simple point that internal testing does not pick up the endgame-boundary problems ... and there were problems to be picked up. Certainly, JdK with FEG initially ignored the possibility of capturing directly into a KNNK mate.

My article on 'Data Assurance in Opaque Computations' with Joe Hurd is at http://centaur.reading.ac.uk/4515/ .

g

Thanks for links, although I was rather hoping that you'll mention the relevant point, once you started this topic. I mean, clearly verification is necessary (complete verification, both internal and boundary, and I believe this should be obvious to everyone), and other than that, what you are trying to say?

Post by **Kirill Kryukov** » Mon Sep 26, 2011 11:30 pm

astokes wrote:I found an hour over the weekend to compute the pawn slices of KPPP-KPP.

Looks good to me. In my solver I allow pawns on the first rank (white on first, black on last), so the combinatorics becomes slightly more complicated.

One question is what do you do with left-right symmetry. You could omit one from each pair of symmetrical pawn slices, and allow kings everywhere. Or you could include all pawn slices and omit positions where white king is on the right side of the board (for instance). The latter is what I am doing.

astokes wrote:Coding dominant move instead of DTx (as I am leaning toward with my compression method), probably 2.5 bits per table entry (on a fairly simple move order heuristic), times two sides to move.

I am very curious to see the actual compression ratios you'll get with storing dominant moves.

astokes wrote:HDF5

Finally, I took a very quick look at HDF5, which I evaluated a couple of years ago for a project with large storage demands. It was very fast in the tests conducted, even from a rather minimal R binding. Fast, portable, infinitely scalable, accommodates metadata, has a rich tool ecosystem, and bindings for everything. Aimed at petabyte data sets, and well financed for future maintenance. Downside is an extremely rich API, so the tiny part of value to this project is tiny needles in a giant documentation haystack--if one takes the view that too much documentation is a bad thing.

HDF-EOS Project

It even seems to support rudimentary packed bit fields. I'll do some performance tests on that soon.

Anyone else have experience with this? I prefer to work with standard file types. Functionally I don't think it offers much over roll-your-own for EGTB tables. However, I would like to bias further collaboration to embed comprehensive combinatorial metadata (position counts) in the giant files (for validation and comparison), and a standard format might facilitate this.

Looks very interesting. Currently I work with hand-crafted file formats, which have no metadata. I should check this in more detail.

What I am more interested in now is some sort of distributed data storage - so that you could spread the multi-terabyte dataset over a number of nodes, and each node could see each file as if it was local. (With necessary downloading, checksumming, backing up, caching, etc, happening transparently and automatically). I'll probably end up coding this by myself.

Post by **Kirill Kryukov** » Mon Sep 26, 2011 11:33 pm

guyhaw wrote:The DTM EGTs seems to compress by a factor of ~4, and the DTZ EGTs by a factor of ~8.

This is interesting. Nalimov's 6-piece DTM is closer to factor 3, as far as I understand. Can you be more specific, who's numbers are these, on what set of endgames, etc? I'm especially curious about the DTZ claim.

guyhaw · Post by **guyhaw** » Tue Sep 27, 2011 7:27 am

The Schaeffer paper I cited is available at http://webdocs.cs.ualberta.ca/~jonathan ... ases10.pdf

JS (p8, item 5 on 'Verification') says that a quick scan of an EGT can check for 'internal' self-consistency but that this does not catch all errors. All positions need to be checked for consistency both within the EGT and with respect to previously computed data [for subsequent endgames]. The latter part dramatically increases verification cost but must be done.

So we're obviously all in agreement about that.

A more general point is that the boundary of any set of values for a parameter in a computation is a 'dangerous place'. It's not naturally in the centre of the mindset of the algorithm designer and can be a place for 'one out' errors.

Also agreed: for DTC and DTZ EGTs, 'n1' and 'n2' can be stored in winner's moves (as now), in loser's moves (not done by anyone yet), or in total plies (the only accurate option, and relevant for a k-move rule).

My recollections about compressibility are based on:
- - - - - considering the size 'S' of the Nalimov index, including all 'broken' positions
- - - - - checking whether, in the uncompressed EGT, there is one byte per position or two (for larger values of 'n1+n2')
- - - - - comparing 'S' or '2S' with the actual size in bytes of the EGT file

My EGT-size figures can be found via http://centaur.reading.ac.uk/4524/.

For sub-6-man chess, DTZ EGTs were 50.24% the size of the DTM EGTs (being 71.29% on P-less endgames, and 43.36% on P-ful endgames). DTZ EGTs were 56.37% the size of the DTM EGTs for the 6-man P-less endgames, so the trend as expected is that DTZ EGTs are increasingly compact relative to DTM EGTs as the number of men increases.

g

astokes · Post by **astokes** » Tue Sep 27, 2011 11:30 pm

I'm stealing ten minutes to skim the Schaeffer paper kindly posted by Guy. From page 8:

To reduce the cost, the program only iterates over all positions until a "small" number of changes occurs in an iteration. The positions that change value are saved in a queue.

This is in line with a musing in one of my early posts.

A more general point is that the boundary of any set of values for a parameter in a computation is a 'dangerous place'. It's not naturally in the centre of the mindset of the algorithm designer and can be a place for 'one out' errors.

Meh. It's so true, but it shouldn't be. Dijkstra pointed out a long time ago that if you code the dangerous boundary correctly, often the interior case writes itself. When you still have degrees of freedom after satisfying post-conditions from pre-conditions, what you typically have left is sequencing choice over parallel statements (when multiple statements can legally execute at the same point in program flow). I recall that Knuth-Morris-Pratt string matching was a good study in this, since you have valid multiple choices in how to move forward; if you're clever, you obtain nearly the best of all worlds, in average and worst case performance. That's where proper attention pays off in the interior case.

In that last paragraph, don't neglect to establish a strictly monotonic variant that converges to a termination. You can't always write your variant down in a nice way (simplest case: parsing an unbounded input stream), but you can think it through to logical satisfaction; sometimes your hard work is relegated to a comment, but what can you do?

I brought that mindset into embedded programming and it never lets me down in program correctness. There's a social consideration when your approach is dominated by subtracting hazards rather than establishing approximate functionality in the hope that accuracy converges upward. I try to balance shims with bricks. The shims provide some early indication of completed functionality, but these are slated for demolition if they prove weak. The bricks are the chunks of code that converged from the dangerous boundary inward; this code is intended to reign eternal.

Programmers get sucked into visualizing the interior case by planning to debug their code; I never plan to do so (for a special value of studiously near-sighted). We also suck ourselves into this by writing if/else constructions without distinguishing between necessity (solely dictated by pre/post conditions) and preference (in time, space, or policy) where other statement orderings were equally valid under program correctness.

It's obviously a very important aspect of this project that we navigate the hazards at the boundary with precision from the outset. Accurate count metadata is critical to validation.

I didn't subpartition my pawn slices by an advancement metric. I was just looking at the various ways 3+2 pawns can be trapped within files for non-capture/non-promotion continuations. To begin with, I'm looking at storage metrics. This kind of subpartitioning is more of a concern when you get into computational metrics.

The paper also talks about weak verification for self-consistency, and more expensive verification against previous computations. I suspected it would break out that way.

Another half a paper left to read ... more for next time.

Finally, I think someone misunderstood my use of the word "move". I will only ever use this to refer to decision branches from the move generator. It didn't even occur to me that "move" could substituted for "ply" in any discussion of a depth metric. The move generator is an input to my proposed compression scheme, as Kirill understands from our private beginnings.

Edit: Probably more than half the comments in any code I write concerns differentiating shimness from brickness in whatever follows. I make heavy use of partial bricks, where the program asserts out in any subcase I haven't formally analyzed. Analysis is often quick when your program immediately bombs when fed some tasty new input, since you're probably thinking clearly about the new input at precisely that moment. I enjoy big computation because people look at you a little less funny for being quite so pedantic.

astokes · Post by **astokes** » Tue Sep 27, 2011 11:51 pm

Kirill, a quick comment on distributed file systems. HDF5 lives in the HPC space and has some application level support for this (mounds of documentation to wade through). I have no idea if what it provides overlaps with your ambitions.

On another front, distributed file systems is the raison d'être of DragonFly BSD. You could maybe post a question in one of their technical forums at some point to see whether any of those guys thinks this problem space is a good fit. They might be enthusiastic about supporting a computation that really pushes the edge in what their (admittedly young) OS is able to handle.

Post by **Kirill Kryukov** » Wed Sep 28, 2011 8:22 am

astokes wrote:Programmers get sucked into visualizing the interior case by planning to debug their code; I never plan to do so (for a special value of studiously near-sighted). We also suck ourselves into this by writing if/else constructions without distinguishing between necessity (solely dictated by pre/post conditions) and preference (in time, space, or policy) where other statement orderings were equally valid under program correctness.

Sorry, I'm not very clear about what interior case has to do with debuggability of the code. I always plan to debug my code, and implement a lot of support for this, even at the expense of performance. Internal solving and boundary solving are totally separate routines in my solver, and both needed some debugging and verification before they worked.

astokes wrote:Edit: Probably more than half the comments in any code I write concerns differentiating shimness from brickness in whatever follows. I make heavy use of partial bricks, where the program asserts out in any subcase I haven't formally analyzed. Analysis is often quick when your program immediately bombs when fed some tasty new input, since you're probably thinking clearly about the new input at precisely that moment. I enjoy big computation because people look at you a little less funny for being quite so pedantic.

I guess that my code would probably look like a pile of crutches by your standard.

astokes wrote:Kirill, a quick comment on distributed file systems. HDF5 lives in the HPC space and has some application level support for this (mounds of documentation to wade through). I have no idea if what it provides overlaps with your ambitions.

On another front, distributed file systems is the raison d'être of DragonFly BSD. You could maybe post a question in one of their technical forums at some point to see whether any of those guys thinks this problem space is a good fit. They might be enthusiastic about supporting a computation that really pushes the edge in what their (admittedly young) OS is able to handle.

Thanks for an interesting advices. I should think about it and check HDF5 in more detail. Currently it looks too complex for what I have in mind. Regarding DragonFly BSD - I guess I'd rather work towards implementing heterogeneous cloud solving, because you never know what OS a potential contributor may be running. Using DragonFly BSD's facility would require everyone to run DragonFly BSD before they can contribute computing resources.

guyhaw · Post by **guyhaw** » Wed Sep 28, 2011 10:45 pm

Interesting to see mention of Edsger Dijkstra.

He visited Cambridge quite often and was, with Tony Hoare, a big influence on my thinking.

His EWDs are mostly transcribed at http://www.cs.utexas.edu/users/EWD/ .

I don't know of any 'appreciation' of EWD's contribution, but http://www.dijkstrascry.com/ looks interesting.

g

CCRL Discussion Board

Mapping the 7-men computation

Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Inheriting from WDL EGTs - correction

Re: Inheriting from WDL EGTs - correction

References to WDL EGTs etc

Endgame boundaries

Use of WDLs: footnote to my 'correction'

Re: Endgame boundaries

Testing EGT integrity

Re: Mapping the 7-men computation

Partitions of 5 ..

Re: Testing EGT integrity

Re: Mapping the 7-men computation

Re: Partitions of 5 ..

Checking on the endgame boundary ... and compressibility

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Dijkstra ...