Mapping the 7-men computation

astokes · Post by **astokes** » Thu Sep 29, 2011 12:00 am

Guy:

One chapter of one book by Dijkstra was the biggest influence on my programming career taking everything together. Don't recall the title of that particular book, but I do know it was on a reading list promulgated in one of P.J. Plaugher's old columns--if memory serves half as well as it used to (perhaps Programming on Purpose, but I think he also wrote others). One suggestion from Dijkstra was to write your code, then stick a pin at random into the bowels of some loop, and then ask yourself if you can formally write down an expression for the exact number of times that line of code executes, relative to the data you are iterating over. There are necessarily places in any loop (modulo languages with parallel assignment) where your invariant is out of sync with your committed program state. The statement that messes up your invariant should be followed--almost 100% of the time--by the statement that restores the invariant. If you are sloppy about this, you'll quickly regret where you stuck the pin, which was precisely his point.

Very cool to see my degree of separation reduced to 2. What impression did Hoare and Dijkstra leave on you?

Kirill:

I was perhaps being too subtle. By "plan" I meant planning on a psychological level. When I write code, I'm thinking almost exclusively about the 1% or 0.1% or 0.01% contingency, not what my code will do 99% of the time. Later, if you do end up debugging your code with a tracing tool, you're forced to think about the dominant 99% (unless you're a wizard with breakpoints beyond mortal ken (an idiom from English which invokes Merlin from King Arthur's court)).

If you're not careful, you'll start fixing what goes wrong with the 99% before you figure out what was going wrong with the 1% that squirms out from under the microscope. This psychology can carry back into the coding effort, to where you pay more attention to the interior (common code paths familiar to you as the back of your hand from your debugging forays) than the boundary (precise loop termination).

I was also poking a bit of fun at the word debug. It's not the only (or best) way to arrive at a correct program to marry yourself to the subtraction of error. One can alternatively engage in ruthless validation of each tiny brick as the program expands in complexity. There's always some of both, but the difference can be 100:1 from one project to another in the number of errors subtracted. I plan not to subtract error (debug), even though I know this is not realistic. It was a wry comment on my state of mind.

I wasn't trying to say my coding is any better than anyone else's. When my process works, it works well. But often I struggle to marry the task to my process while others are charging ahead to much greater effect. Pros and cons. For this project, I'm hoping for more pros than cons, which is why I put forward a philosophical note.

Can you really debug a program with an expected runtime of 2^60 operations? As opposed to starting with very few errors from the outset?

Edit: Perhaps even more apropos as a summary question: Is it possible to plan to make no errors at all without a bitterly clear notion of how you would identify and eliminate such a calamity if you were to make one?

astokes · Post by **astokes** » Thu Sep 29, 2011 1:55 am

Kirill:

My default is to view this as a big computation that's performed in full glory twice or thrice on highly specialized platforms. I can't get my head around terabytes for every man. Perhaps you think bigger than I do. I see an unconquered mountain, you see a tourist attraction. But I appreciate your instinct toward a portable and application-specific solution to the storage format, even if we continue to run on complementary tracks.

Ferengi Rules of Acquisition:
34. Peace is good for business.
35. War is good for business.

Complementary has its merits.

Allan

astokes · Post by **astokes** » Thu Sep 29, 2011 5:05 am

Kirill:

Amassing some preliminary dominant move statistics is much on my mind. I was planning to use KR-KN as my first test case.
462 * 64 * 64 < 2^9 * (2^3)^4 = 2^21 = 2 million.
Not out of the range for loading into memory in an R workspace. I could slap some columns onto the data frame concerning king position, king mobility, king-king distance, etc. and then play with some of the many data mining and machine learning packages in R (emphasis on "play", I have the expensive book, but no practical experience).

The checkers paper suggests that a little pain in the heuristic can be worth it if the compression is much improved. There are plenty of subtleties here. I'm thinking of limiting my heuristic to one and a half plies (perform each move available, and compute the response move lists for each of these positions, but do not perform those moves). This gets you checkmate detection and a strong heads-up on forced move sequences (highly constrained opposition mobility). Perhaps capture is evaluated on subordinate tables for either ply. There are choices to make to balance cost with effectiveness.

Minimally, you could start with just how often each piece is moved, out of all positions solved. If for white rook moves are 60% and king moves are the other 40%, you've already got one useful weighting term.

Likely I'll build some Russian dolls where first I evaluate the simplest possible heuristic (dart board), then add a succession of weak increments in heuristic complexity. (Do Russians call these "Polish dolls" the same way the British and French ascribe unfortunate disease to the other party? Languages are funny that way.)

Dominant move effectiveness will vary wildly by piece combination. For KR-KN it should be moderately effective.

For larger piece combinations, I'm wondering whether machine learning over a random sample would be effective, or whether human intuition is strong enough to get what you need without this exercise.

Thomas and Cover references C. E. Shannon, Prediction and entropy of printed English, 1951. As I recall it from long ago, the key insight there is that high rates of accuracy dominate low rates of accuracy in the prediction gambling game, due to exponential accumulation. In theory you can flash a lot of random endgames (from the chosen piece combination) at a speed-chess master to see how he or she fares. The chess expert won't appreciate being flashed the positions from KPPP-KPP where three passed white pawns occupy the same file as two passed black pawns. You might need to dial down the randomness to a dull roar, relegating the nutter (crazy) positions to machine learning.

You can also set the game up to mine human intuition for partial knowledge. One such game would flash only positions where there is either no dominant rook move or no dominant king move (meaning the other piece must move in an accurate mating line) and just have the player bet king move against rook move (but very fast or they'll look ahead, which isn't what I would be interested in mining).

In notorious endgames such as KRB-KNN you wouldn't have to fear the human looking ahead, but I'm guessing the human doesn't score much better than the simplest computational heuristic.

In the "Shannon guessing game" I'm referencing, the gambler must partition all available funds over the options given. If you bet zero on the winning option, you're bankrupt and out of the game, so a prudent player never does that.

From T&C page 139:

As in the case of the horse race, the optimal bet is proportional to the conditional probability of the next letter.

Winnings are exponential in the quality of your assessment of that probability.

Edit: From a previous discussion, a dominant move is any move where no other move has a shorter distance metric. You win if you guess any dominant move. Whichever dominant move the heuristic guesses first is the move returned by the encoded table. Distance metrics are reconstructed on the fly by traversing dominant move sequences to metric zero (which you must be able to identify as such without using the table). This traversal is linear in depth. Distance metrics can be computed for all available moves, which permits one to identify all dominant moves at reasonable cost (these will all share the same minimum metric). Obviously you prefer a compression method where you're not paging from disk as you run down the chains.

guyhaw · Post by **guyhaw** » Thu Sep 29, 2011 7:43 am

I think a lot of complexity about 'storing moves' rather than 'storing depths' is being proposed here. And I don't think it's a good idea anyway.

First, it's important to be able to compare a new EGT against, say, a Nalimov EGT, especially if the new EGT is using metric DTM. This cannot be done if 'best move' is stored rather than depth.

Secondly, there may be more than one equi-optimal move: what then?

Thirdly, in a DTR/DTZR/DTZ situation, one would want to compare the DTZ-depth of a position, and of a successor-position, to the number of moves left in the current phase before the move-count is zeroed.

'Storing the move' gives some local, 'relative' information but does not give the global, 'absolute' information which may be required. Granted, it saves on look-up disc-accesses, but the prior uses of bitbases will restrict look-up to the value-preserving moves anyway.

Both Edsger Dijkstra and Tony Hoare had very clear, logical minds of course. They were very observant about programming, and could illustrate their points with clear, highly distilled examples. I last met Dijkstra at the Cambridge Computer Lab's 50th Anniversary celebration of the EDSAC Computer Service starting, the first computational service in the world: he remained as focused as ever. Tony Hoare led a very interesting BCS meeting, a report of which can be found via http://centaur.reading.ac.uk/15296/ .

I tried to follow their closely analytical style in http://centaur.reading.ac.uk/4515/ , which arose out of a slightly optimistic belief that HOL guaranteed correct EGTs which would be understood by their users. [ An obvious flaw is that the EGTs, like all others, do not include positions with castling rights, and that would probably be unknown to most users of them. ]

g

Post by **Kirill Kryukov** » Mon Oct 03, 2011 11:08 am

astokes wrote:Kirill:

I was perhaps being too subtle. By "plan" I meant planning on a psychological level. When I write code, I'm thinking almost exclusively about the 1% or 0.1% or 0.01% contingency, not what my code will do 99% of the time. Later, if you do end up debugging your code with a tracing tool, you're forced to think about the dominant 99% (unless you're a wizard with breakpoints beyond mortal ken (an idiom from English which invokes Merlin from King Arthur's court)).

If you're not careful, you'll start fixing what goes wrong with the 99% before you figure out what was going wrong with the 1% that squirms out from under the microscope. This psychology can carry back into the coding effort, to where you pay more attention to the interior (common code paths familiar to you as the back of your hand from your debugging forays) than the boundary (precise loop termination).

I was also poking a bit of fun at the word debug. It's not the only (or best) way to arrive at a correct program to marry yourself to the subtraction of error. One can alternatively engage in ruthless validation of each tiny brick as the program expands in complexity. There's always some of both, but the difference can be 100:1 from one project to another in the number of errors subtracted. I plan not to subtract error (debug), even though I know this is not realistic. It was a wry comment on my state of mind.

I wasn't trying to say my coding is any better than anyone else's. When my process works, it works well. But often I struggle to marry the task to my process while others are charging ahead to much greater effect. Pros and cons. For this project, I'm hoping for more pros than cons, which is why I put forward a philosophical note.

Can you really debug a program with an expected runtime of 2^60 operations? As opposed to starting with very few errors from the outset?

Edit: Perhaps even more apropos as a summary question: Is it possible to plan to make no errors at all without a bitterly clear notion of how you would identify and eliminate such a calamity if you were to make one?

Very interesting to read about your approach to design and coding. I am probably less systematic than you in my practice, but I too developed rather defensive mindset when designing a program, and always thinking about what will happen if (when) something goes wrong. I am a heavy user of asserts, and I do many double-checking. Performance suffers sometimes, but with this kind of projects correctness is much more important.

I totally agree with the preference of "starting with no error" compared to "eliminating bugs". I try to achieve this with clean and simple design, although still don't use any formal validation of "bricks".

Of course, independent verificaition (e.g., comparison of statistics between two projects) is very important assurance for game solving. A lot can probably be done to make it easier. As I mentioned previously, one important step forward would be made by establishing a standard way of counting positions.

(For one example, I'd be glad to see an independent verification for the total number of unique legal positions in 4x4 chess, 3,677,542,994,054,890).

Post by **Kirill Kryukov** » Mon Oct 03, 2011 11:14 am

astokes wrote:Kirill:

My default is to view this as a big computation that's performed in full glory twice or thrice on highly specialized platforms. I can't get my head around terabytes for every man. Perhaps you think bigger than I do. I see an unconquered mountain, you see a tourist attraction. But I appreciate your instinct toward a portable and application-specific solution to the storage format, even if we continue to run on complementary tracks.

Ferengi Rules of Acquisition:
34. Peace is good for business.
35. War is good for business.

Complementary has its merits.

Great observation. Exactly, my plan is to not just perform the computation once, but establish a framework where this (or similar) computation can be done (and re-done) reliably and affordably. I mean, moon landing was a great feat, but today we're struggling to go there again.

Post by **Kirill Kryukov** » Mon Oct 03, 2011 11:20 am

Allan, it's very interesting for me to follow your thoughts and your experiments. I'm also very curious about the compression rates you can get when storing moves. Also how various heuristics affect the compactness. I appreciate when you post your results or ideas.

astokes wrote:Likely I'll build some Russian dolls where first I evaluate the simplest possible heuristic (dart board), then add a succession of weak increments in heuristic complexity. (Do Russians call these "Polish dolls" the same way the British and French ascribe unfortunate disease to the other party? Languages are funny that way.)

We call them "Matryoshka".

Post by **Kirill Kryukov** » Mon Oct 03, 2011 11:26 am

I'm now considering to redesign my solver a bit, to work around the imperfection that my DTX/DTC/DTZ tables can't be used with N-move rule. I'd have to either completely separate the DTx in plies from WDL information, or just store the DTx in plies. This probably also means upgrading from 1 byte to 2 byte values (1 byte was sufficient for minichess so far), so altogether this will take some time.

QuakePhil · Post by **QuakePhil** » Mon Oct 03, 2011 8:32 pm

Fun fact - Matryoshkas (I've never heard them called "russian dolls" before, or "<any kind of nationality> dolls" for that matter) originated in Japan

astokes · Post by **astokes** » Wed Oct 12, 2011 6:35 am

AMD's Bulldozer was released today. Underwhelming. Almost inexplicable given the increase in die size (2 billion transistors). You really have to hope for AMD that there are some performance gremlins to address to unleash the full potential. This part desperately needs some remedial gremicide to become fully competitive.

Anand's somber review invokes 8 queens as a prime example of a branchy mess. I know he roots for AMD so he's using polite language.
The Impact of Bulldozer's Pipeline

This certainly won't expedite a huge competitive response from Intel. If anything, prospective performance levels just tipped slightly downward. Bummer.

Post by **Kirill Kryukov** » Wed Oct 12, 2011 8:30 am

From my perspective on Bulldozer a bigger let down is the dual-channel memory controller. With the upcoming Sandy Bridge-E from Intel I'll be able to assemble a relatively cheap machine with 32 GB RAM (8 x 4 GB), or a more expensive one with 64 GB. With a desktop Bulldozer 32 GB build is much more expensive (4 x 8 GB modules), and 64 GB is impossible. So I'm not impressed with the desktop Bulldozer CPU for my purposes.

Post by **Kirill Kryukov** » Wed Oct 12, 2011 8:51 am

guyhaw wrote:While we are talking about boundaries ...

Most people assume that the winner does the conversion into the next endgame. This is not always true: the loser can convert a Pawn, or be forced to capture. N Wirth, back in the 1990s, inherited a piece of code that assumed 'winner does all the conversions' and as a result his otherwise more accurate 'depths in plies' were unpredictably less accurate at times.

Originally I thought this issue was not important, and that simplified move-counting DTC was sufficient (and more compact) than a perfect DTC in plies. However if we add a 50-move (or any N-move) rule into the picture, a real problem appears as the move-counting DTC can lead to error in recognizing the 50-move draw. I guess that you talked about this before, but I took some time to really understand why this is important.

So, although I don't use any N-move draw rule yet, I changed my solver to use ply-counting DTX, DTC and DTZ. I observe the increase of compressed table sizes by about 14% (for 3 to 6-piece tables in 4x4 chess, in DTX, DTC and DTZ).

So I am now curious. Do you know of any other game-solving or endgame-solving project that stores DTX, DTC or DTZ in plies? (Or am I first one to go through the trouble of actually implementing this?)

My another recent realization: Normally we verify the tables using trivial (although time consuming) 1-ply forward search from each position. This works great for all metrics I use so far (DTM, DTX, DTC, DTZ, WDL), however this won't work for WDL50. The only way to verify a WDL50 table will be by verifying a DTZ50 table and re-converting it to WDL50 twice and comparing the two copies. Which is perhaps better than nothing, but not as good as normal verification.

guyhaw · Post by **guyhaw** » Wed Oct 12, 2011 9:29 am

As I nearly said, Christoph Wirth generated DTC EGTs with depth in plies. [As usual, I said 'N Wirth' by mistake as I tend to have Nicklaus Wirth's name in my head.]

So you won't be the first to measure depth in plies, KK, but you might be the first to measure depth in plies correctly.

I think CW inherited some core software, maybe called 'RETROENGINE', from Thomas Lincke (http://e-collection.library.ethz.ch/ese ... 905-02.pdf) which assumed change of phase was achieved by the eventual winner playing a move.

So if the line changed phase with a forced-capture by the loser, the depth was one ply 'light'. This means that if, as is usual, a phase starts with a loser's move, then after 100 plies, both sides have played 50 moves, and the loser - faced maybe with a forced-capture can decline the play the only move they have and instead claim a draw.

The result is that you have a position '50 (winner's) moves deep' or '100 plies deep' which is not a win but a draw, assuming a claim.

g

astokes · Post by **astokes** » Wed Oct 12, 2011 11:40 am

Kirill,

I didn't mention SandyBridge-E because I haven't decided which performance attribute is most important. The four channel memory controller rocks if your computation is memory bound in the right way. Sequential access (reading from queues) will have trouble saturating this, even with many cores. Random access would likely become latency bound. The R language is vector oriented and likes all your data in memory. Huge vector operations are the best way to saturate this kind of bandwidth. I don't conceive of the retrograde algorithm as a pure vector algorithm. The multiplicity of edges to vertices makes it hard to view this way, but maybe I missed a trick.

I was trying to figure out what AMD thought they were doing with this. Then I realized that the APU concept requires CPU and GPU circuits on the same silicon. Traditionally, these circuits have used very different design rules (transistor geometries and stage frequencies). Perhaps AMD is trying to build a competitive CPU around design rules that will later welcome the GPU on an equal footing. That would at least make sense. Trinity will be such a beast. Meanwhile, this new CPU is rather tepid.

For my overall workload, if I were upgrading my rig tomorrow, it would be quad-channel SB-E in a monster stomp. IvyBridge looks like a fairly minor set of improvements in pre-release tidbits.

EDIT: For random access workloads, it's important to determine whether the SB-E memory controller can track enough in-flight requests to saturate four channels on a steady diet of independent memory requests. The in-flight resources might be partitioned across memory channels, meaning that perfectly uniform access distributions can saturate, but non-uniform patterns can't. These beasts are so complex you really have to benchmark representative code.

Allan

astokes · Post by **astokes** » Wed Oct 12, 2011 11:52 am

The only way to verify a WDL50 table will be by verifying a DTZ50 table and re-converting it to WDL50 twice and comparing the two copies.

You can generate once and then exhaustively probe each WDL50 position against any verified DTZ >= 50 table. Can't you? It's almost the same thing, with a little less reliance on the correctness of your conversion code. It doesn't strike me as a sub-standard verification when phrased this way.

I'll figure these metrics out some day.

syzygy · Post by **syzygy** » Thu Oct 13, 2011 12:16 am

Kirill Kryukov wrote:Originally I thought this issue was not important, and that simplified move-counting DTC was sufficient (and more compact) than a perfect DTC in plies. However if we add a 50-move (or any N-move) rule into the picture, a real problem appears as the move-counting DTC can lead to error in recognizing the 50-move draw.

This old post might be of some interest. During generation, depth needs to be stored in plies. After the generation has finished, depth can be stored in moves provided that tables for both sides to move are stored.

guyhaw · Post by **guyhaw** » Thu Oct 13, 2011 6:40 am

The KQNKRR endgame has maxDTM = 174 (wtm and btm) and maxDTC = 152 (wtm and btm).

The position r5r1/8/k7/8/8/8/3K4/1Q4N1 b is a maxDTC/M position, and some years ago I followed a maxDTC/M line. After 95 moves by each side, the DTC-minimaxing line and the DTM-minimaxing line parted company.

Position 153b in the DTC-minimaxing line, i.e. after White has played 152 moves in this phase was k7/2N5/1Q6/8/2r4r/5K2/8/8 b - - 0 1.
This is lost for Black in 1 ply (0 moves!) as 153b ... Rxc7 is forced.

So if the DTC EGT had been properly backed up counting depths in plies, it seems likely that there has to be [discuss!] a btm position with depth 101 plies (50 moves) which is 'won' but which ends with a forced capture by Black - except that it could be preceded by a draw-claim over the board.

The DTC-minimaxing line I followed (' = DTC-optimal move, " = only DTC-optimal move) was:

1...Rad8+' 2.Ke3" Rge8+' 3.Kf4" Rd4+' 4.Kg3" Rg8+' 5.Kf2" Rf8+' 6.Ke3" Rd6' 7.Qa1+ [Qa2+] Kb6' 8.Qb2+" Kc6' 9.Qc3+' Kd7' 10.Qg7+" Ke8' 11.Qe5+" Kd7' 12.Qb5+' Ke7' 13.Qb7+" Rd7' 14.Qe4+' Kd8' 15.Qh4+' Kc8' 16.Qg4' Re8+' 17.Kf2 [Kf3] Rf8+' 18.Kg2' Kc7' 19.Sh3' Rdf7' 20.Sg5" Rf2+' 21.Kg3' Rf1' 22.Qc4+" Kb6' 23.Qd4+ [Qe6+] Kb7' 24.Qd5+' Kb6' 25.Kh4' R8f4+' 26.Kh5" Rf5' 27.Qc4' R1f2' 28.Qb3+ [Kg6/h6, Qd4+] Kc6' 29.Kg6 [Kh6] Rf6+' 30.Kg7" R6f4' 31.Qd3 [Kh7] Kb6 [Rf1] 32.Kh7' Rf1' 33.Qd6+' Kb5' 34.Se6" Rf7+' 35.Kg8' R7f2' 36.Qc5+ [Qd5+] Ka6' 37.Qc6+' Ka5' 38.Kh7 [Kh8] Rf7+' 39.Kh8' R7f2' 40.Qb7' Rf3' 41.Qd5+ [Qa7+/c7+] Kb4 [Kb6] 42.Qc5+' Ka4' 43.Qc4+' Ka5' 44.Sg5' Rf4' 45.Qb3' Rh4+ [Rf6/h1+] 46.Kg7 [Kg8] Rhf4' 47.Kh7' Rh4+ [Rh1+] 48.Kg6' Rhf4' 49.Qb2' R4f2' 50.Qb7' Rf8' 51.Kh6 [Kh7] R8f6+' 52.Kh5' R6f5' 53.Qb2 [Qb8/d7] Rf8' 54.Qc3+ [Qe5+] Kb6' 55.Qd4+' Kb5' 56.Se4 [Se6] Rh8+' 57.Kg4" Rg8+' 58.Sg5" Rgf8' 59.Kh3 [Qd5+] R8f4' 60.Qd5+" Kb6' 61.Qd7" Rh1+' 62.Kg2' Rhf1' 63.Se4' R4f3' 64.Kh2' Rf5' 65.Sd2' R1f2+' 66.Kg3' Ka6' 67.Qc7' Rf7' 68.Qc6+" Ka7' 69.Se4" Rf1' 70.Qa4+' Kb8' 71.Qb3+' Kc7 [Ka7/8] 72.Qc3+' Kb8 [Ld7] 73.Qb2+' Kc7' 74.Qe5+' Kd7' 75.Qd5+' Ke7' 76.Qd6+' Ke8° 77.Qc6+ [Qb8+] Ke7' 78.Qc7+' Kf8' 79.Qd8+' Kg7° 80.Sg5" R7f6' 81.Qd5" Kg6' 82.Se6" R6f2' 83.Qe5' Kf7' 84.Qd6' Rf3+' 85.Kh4' Rf5' 86.Kg4' R5f2' 87.Sg5+' Ke8' 88.Qb8+' Kd7' 89.Qb7+' Kd8' 90.Se6+' Ke8° 91.Qe4' Kd7' 92.Sc5+" Kd6' 93.Qd4+" Kc7' 94.Qd7+' Kb6' 95.Qd6+" Kb5' 96.Se4" Rf8' 97.Qe6' R1f4+' 98.Kg3' Kb4' 99.Qd6+' Kc4' 100.Qd1 [Qd7] Rf3+' 101.Kh2' R3f4' 102.Qd7' Rh8+ [Rh4+] 103.Kg3' Rhh4' 104.Qc6+' Kb3 [Kb4] 105.Qb5+ [Qb6+] Ka3' 106.Qa5+' Kb3' 107.Sd2+' Kb2' 108.Sf3 [Qb5+/6+/e5+] Rhg4+' 109.Kf2' Re4' 110.Qb5+ [Qb6+] Kc3' 111.Qc5+' Kb2' 112.Se5' Rgf4+' 113.Kg3' Rh4' 114.Sd3+' Kb3' 115.Kf2 [Sc1] Rc4' 116.Qb6+' Kc3' 117.Sc5' Rcg4' 118.Ke2' Kc4' 119.Qc6' Kd4' 120.Qd6+' Kc4' 121.Sd3' Rd4' 122.Qc6+' Kb3° 123.Kd2' Rh2+' 124.Ke3' Rhh4' 125.Sc1+' Kb2' 126.Se2' Rhe4+' 127.Kf2' Rc4' 128.Qb5+ [Qb6+] Rb4 [Kc2] 129.Qg5' Rbc4' 130.Ke1' Rc7 [Kb3, Rc8] 131.Qb5+ [Qf6+] Kc2' 132.Qf5' Rcc4' 133.Qg6' Kb3' 134.Kd1' Rh4' 135.Kd2' Rce4 [Ka4] 136.Qc6 [Kd3] Kb4' 137.Kd3' Rc4' 138.Qb6+' Ka3' 139.Qb1 [Qb7] Rce4' 140.Qb5' Rc4' 141.Sc3' Rhg4' 142.Qb1' Rcd4+' 143.Ke2 [Ke3] Rc4 [Rg2] 144.Sb5+' Ka4° 145.Kd3' Rh4' 146.Qb2' Ka5' 147.Sc3' Rhd4+ [Ka6] 148.Ke3' Ka6' 149.Qb5+' Ka7° 150.Sd5' Re4+' 151.Kf3' Rh4 [Red4/g4] 152.Qb6+' Ka8° 153.Sc7+' and now Rxc7° is forced.

So is position 103b (DTC = 50 winner's moves so DTC = at least 100 plies), 7r/3Q4/8/8/2k1Nr2/6K1/8/8 b in fact a loss in 101 plies?!

g

Post by **Kirill Kryukov** » Fri Oct 14, 2011 2:48 am

guyhaw wrote:As I nearly said, Christoph Wirth generated DTC EGTs with depth in plies. [As usual, I said 'N Wirth' by mistake as I tend to have Nicklaus Wirth's name in my head.]

So you won't be the first to measure depth in plies, KK, but you might be the first to measure depth in plies correctly.

For the correctness, well, it passes verification, which is now also ply-counting, and more or less separate from solving code.

guyhaw wrote:I think CW inherited some core software, maybe called 'RETROENGINE', from Thomas Lincke (http://e-collection.library.ethz.ch/ese ... 905-02.pdf) which assumed change of phase was achieved by the eventual winner playing a move.

Thank you very much for posting that link!!

Post by **Kirill Kryukov** » Fri Oct 14, 2011 3:04 am

Allan, my computation is mostly limited by IO and therefore by memory size used for caching. Therefore a practical 32 GB box will be a huge step forward for me. (64 GB would be even better if I could afford it). Another big improvement for me would be upgrading my storage with some SSDs. The computation speed of current CPUs is satisfactory for me, in any case there is a huge unexplored space of optimization work in my solvers.

I don't design for any partitioning of requests across memory channels, or think much about cache locality etc.. I believe this kind of work is in the land of premature optimization, and I try to focus on implementing the big ideas first, which is already hard enough for me.

Post by **Kirill Kryukov** » Fri Oct 14, 2011 3:08 am

syzygy wrote:
Kirill Kryukov wrote:Originally I thought this issue was not important, and that simplified move-counting DTC was sufficient (and more compact) than a perfect DTC in plies. However if we add a 50-move (or any N-move) rule into the picture, a real problem appears as the move-counting DTC can lead to error in recognizing the 50-move draw.
This old post might be of some interest. During generation, depth needs to be stored in plies. After the generation has finished, depth can be stored in moves provided that tables for both sides to move are stored.

Wow, I somehow totally missed that thread. Thanks for posting the link.

In particular, I am amazed that the idea of storing a rounded metric to save space was already proposed there. I independently came to the same idea last year, and I believe it will provide a very important space saving. (I think I know how to make it work correctly with ply-counting tables and the N-move rule, but it needs testing).

Post by **Kirill Kryukov** » Fri Oct 14, 2011 3:11 am

astokes wrote:
The only way to verify a WDL50 table will be by verifying a DTZ50 table and re-converting it to WDL50 twice and comparing the two copies.
You can generate once and then exhaustively probe each WDL50 position against any verified DTZ >= 50 table. Can't you? It's almost the same thing, with a little less reliance on the correctness of your conversion code. It doesn't strike me as a sub-standard verification when phrased this way.

Yes, I think you're right. This is probably better than double conversion.

astokes wrote:I'll figure these metrics out some day.

astokes · Post by **astokes** » Fri Oct 14, 2011 11:21 am

Kirill,

I think what you're saying is that the quad channel SB-E appeals to you because it permits 32 or 64GB memory configurations without the expense of registered ECC and a server mainboard with 16 or 18 DIMM sockets on two or three channels. You're not really looking at saturating 50GB/s memory bandwidth.

If I choose to upgrade next spring:.
$300 SB-E CPU with 10MB L3 (not the K version)
$300 mainboard (reliable boards will be expensive for LGA-2011)
$500 memory (8*8GB generic performance grade, matched if possible)

Memory prices are not expected to plummet any time soon. DRAM Price Reductions Decelerate in 2012

The X79 chipset has been criticised for lopping features, but it's fine for our purpose from what I can see.

X79 chipset ships in November

Fudzilla is exactly as accurate as the name implies, but whether correct/incorrect they tend to be first. Wikipedia lists it as supporting 6 SATA 6 Gbps, and four more slower ones. Well, that really ought to be good enough if I someday end up configuring a small SSD array.

I might follow in early 2013 with an AMD 7000 (Tahiti) video card with the new GPU architecture.

In the SSE world, it's nice to have a rough sense of what instruction set you'll be running at various junctures in the development path. Intel has already announced AVX2 for Haswell early 2013. Some of the new instructions would be useful.

document

AVX2 extends Intel AVX by promoting most of the 128-bit SIMD integer instructions
with 256-bit numeric processing capabilities. AVX2 instructions follow the same
programming model as AVX instructions.

In addition, AVX2 provide enhanced functionalities for broadcast/permute operations
on data elements, vector shift instructions with variable-shift count per data
element, and instructions to fetch non-contiguous data elements from memory.

If coding progresses slowly, I could end up going that route.

Allan

syzygy · Post by **syzygy** » Sat Oct 15, 2011 2:20 pm

Kirill Kryukov wrote:Wow, I somehow totally missed that thread. Thanks for posting the link.

I was actually wondering about these questions some days ago and was pleasantly surprised to see I had already looked into these issues about three years ago

.

In particular, I am amazed that the idea of storing a rounded metric to save space was already proposed there. I independently came to the same idea last year, and I believe it will provide a very important space saving. (I think I know how to make it work correctly with ply-counting tables and the N-move rule, but it needs testing).

Well, it probably has always been more common to store counts in moves rather than plies, but indeed this gets far more tricky when taking into account an N-move rule.

If a particular table (or rather "phase") has maxDTZ < N, it is OK to lose a single ply, so storing counts in moves for that phase (for one side to move or for both sides to move) is unproblematic.

If a particular phase has maxDTZ >= N, it is generally not OK to lose a ply. It is necessary to be able to reconstruct the exact counts in plies. This is possible by storing in plies (for one side or for both sides to move) or by storing in moves for both sides to move.

My plan is to generate all 6-piece tables with 50-move rule (plus sufficient information to "win" positions drawn by the 50-move rule: DTZ50+). I will probably be generating WDL tables (or rather WDL50+ tables: D/W50/W/L50/L) for both sides to move, and DTZ50+ tables for one side to move. So the DTZ50+ tables with maxDTZ >= 50 will need to store values in plies (unless it turns out that storing tables for both sides to move measured in moves is more efficient in terms of storage space - unlikely).

For tables with a few pawns, I might try making the choice between moves and plies dependent on the pawn slice, but I'll see about that later.

guyhaw · Post by **guyhaw** » Sat Oct 15, 2011 2:52 pm

Referring to the KQNKRR position 103b above, the only way this fails to be a loss in 101 plies is if White can find an alternative line where it mates or captures (rather than forcing Black to capture a wN with a Rook).

For it to be obvious that this option is available, and to back up the depth of the btm position to 100 rather than 101 plies, the depth of all relevant positions have to be in plies rather than in moves.

It may be tempting to define the depths of positions (which are much shallower than 100 plies) in moves rather than plies, but it is likely that that position is not the first in the current phase of play, and that some of the phase's budget of 100 plies has already been spent.

So, the exact count of plies is relevant and potentially critical all the way down to the next phase.

g

astokes · Post by **astokes** » Wed Oct 19, 2011 8:52 pm

Slashdot has a lame roundup of existing Z68 mainboards. Among the comments was an interesting tidbit Re: I want more RAM Slots:

Once you get above about 8GB of RAM, you really shouldn't trust it for anything serious without ECC. Even the best listed bit-error rates (which are all pretty much a guess) show that 32GB of RAM will have an error every 3 years, while the worst listed rates would give you an error every 2 minutes. Based on ECC logs in my servers, I'd say the reality is about 8 errors per year for 32GB. So, if you want 64GB, and can live with a once-a-month error that could be anything from a BSOD to a subtle corruption of data, then go ahead and use a desktop chipset.

I don't entirely get this sentiment. If four times as much RAM leads your computation to run four times as fast, have you lost ground on reliability? Others mutter than ECC costs three times as much, but this must mostly be due to far higher grades of quality control or willingness to pay, and not the additional bit columns.

CCRL Discussion Board

Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

A couple of points

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Endgame boundaries

Measuring depth in plies

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Endgame boundaries

Re: Mapping the 7-men computation

Re: Measuring depth in plies

Re: Mapping the 7-men computation

Re: Endgame boundaries

Re: Mapping the 7-men computation

Re: Mapping the 7-men computation

Re: Endgame boundaries

The case for 'plies all the way down' ...

Re: Mapping the 7-men computation