6-piece checkers tablebases

Ed Trice · Post by **Ed Trice** » Thu Oct 28, 2010 1:30 am

Though possibly a little off=topic, should anyone want the 6-piece endgames for checkers, I released my database generator to the public over a year ago:

http://www.liquidnitrogenoverclocking.c ... ults.shtml

I'd be interested in knowing how long it took on systems whose speeds are not listed on that page, for those who wish to run it. It's just a DOS-like console application, but it does require being run on 64-bit hardware with a 64-bit Operating System. Even though checkers has only 32 squares, I found a way to make a very fast 64-bit move generator with very few instructions using some peculiar bitboards spanning most of the 64-bit aperture.

Post by **Kirill Kryukov** » Thu Oct 28, 2010 3:10 am

Hi Ed.

Welcome to the forum. This is not off-topic here, thanks for posting!

Are you planning to extend your generator to 7 pieces or more?

Ed Trice · Post by **Ed Trice** » Thu Oct 28, 2010 3:58 am

Kirill Kryukov wrote:Hi Ed.

Welcome to the forum. This is not off-topic here, thanks for posting!

Are you planning to extend your generator to 7 pieces or more?

Hi Kirill,

Thanks for the welcome, glad to be here.

I did solve tb7 for checkers with a programming buddy of mine named Gil Dodgen (this was back in 2003) and we published the data in ICGA. You can download our paper here:

http://www.gothicchess.com/7_piece.pdf

Most of that code was "thrown together", if you know what I mean. Each endgame had its own challenges, especially since we only had 1.5 GB of RAM and a 500 MHz processor was used to generate about half of the set.

Since that time, I revamped every aspect of the solver: new move generator, new buffering structure and technique, new indexing functions, and ability to solve wins as long as 1021 plies.

I plan on solving up to 10 piece endgames by the end of 2011. This would be 39 trillion positions, where the "American trillion" = 1000 x 1000 x 1000 x 1000.

The longest known 10-piece win for red to move so far:

Post by **Kirill Kryukov** » Thu Oct 28, 2010 4:18 am

I recall seeing this paper before, so I was having impression you did for more than 6 pieces, but did not recall the details now. Thanks for reminding and for the news. Are you using some WDL metric, or distance-based? If your data compresses well then the storage should be within the reach of ordinary PC RAID, now that 3 TB disks are selling.

I am very curious about your 10-piece solution and generator, please post when you'll have any updates. What I am particularly interested in is about solving Russian draughts endgame. Do you think your solver can be adopted to this task?

Ed Trice · Post by **Ed Trice** » Thu Oct 28, 2010 1:07 pm

Kirill Kryukov wrote: Are you using some WDL metric, or distance-based?

I am using Distance To Win/Lose, not to be confused with Distance To Conversion. I think Gil Dodgen and I are the only ones who have done Distance To Win in checkers. At least two people have done Distance To Conversion databases through at least 7 pieces, and many of us have done Win-Loss-Draw, or what the academic world calls "Game Theoretical Value."

My reporting output is different than the ".tbs" files the chess community is used to seeing. I create one large report file, like this (if you have a slow internet connection, don't click on this link unless you are prepared to wait 1 or 2 minutes):

http://www.liquidnitrogenoverclocking.com/report.txt

I show the stats through each iteration, as well as the longest wins when there is an immediate jump present, and also for positions where there are no jumps (some of the ones with jumps are not too interesting as they are just immediate conversions).

Kirill Kryukov wrote:If your data compresses well then the storage should be within the reach of ordinary PC RAID, now that 3 TB disks are selling.

That's the thing, it's 1 byte per position through 7 pieces, 10 bits per position for everything larger.

Kirill Kryukov wrote:I am very curious about your 10-piece solution and generator, please post when you'll have any updates. What I am particularly interested in is about solving Russian draughts endgame. Do you think your solver can be adopted to this task?

The algorithm will work for any checkers variant. What needs to change:

1. The move generator
2. The indexing functions (if the board size is different)

The solver goes through each permutation of possible material distributions (5 kings vs. 5 kings, 5 kings vs. 4 kings + 1 checker, 4 kings + 1 checker vs. 4 kings + 1 checkers, etc.) it places every piece on every possible square, generates the moves, looks up known results, solves what is possible to solve, and continues until it can make no more resolutions. So, just replacing the move generator should allow other games to be solved in the same way.

Post by **Kirill Kryukov** » Thu Oct 28, 2010 3:20 pm

Ed Trice wrote:I am using Distance To Win/Lose, not to be confused with Distance To Conversion. I think Gil Dodgen and I are the only ones who have done Distance To Win in checkers. At least two people have done Distance To Conversion databases through at least 7 pieces, and many of us have done Win-Loss-Draw, or what the academic world calls "Game Theoretical Value."

Do you have 50 moves rule or something like that in checkers? It causes a lot of issues and debates in chess endgame solving, and limits the usefulness of Win-Draw-Loss tables. Also Nalimov tables (the most popular format in chess, distance to win, up to 6 pieces currently) are suffering from not accomodating this rule.

Ed Trice wrote:My reporting output is different than the ".tbs" files the chess community is used to seeing. I create one large report file, like this (if you have a slow internet connection, don't click on this link unless you are prepared to wait 1 or 2 minutes):

http://www.liquidnitrogenoverclocking.com/report.txt

I show the stats through each iteration, as well as the longest wins when there is an immediate jump present, and also for positions where there are no jumps (some of the ones with jumps are not too interesting as they are just immediate conversions).

Interesting. A bit too verbose to my taste, but it's nice to see diagrams in a text log file.

I also write solving report into one huge file, and statistics are saved into another single file (Actually one file for all N-piece endings). I'm not a fan of .tbs format where you have a thousand tiny files.

Ed Trice wrote:
Kirill Kryukov wrote:If your data compresses well then the storage should be within the reach of ordinary PC RAID, now that 3 TB disks are selling.
That's the thing, it's 1 byte per position through 7 pieces, 10 bits per position for everything larger.

Do you mean you never compress the thing? Is it because the compression ratio is too low, or because it slows down the querying too much? My chess DTM files compress to about 3 positions per byte in DTM (distance to mate - this is what you call distance to win/lose) metric.

Ed Trice wrote:The algorithm will work for any checkers variant. What needs to change:

1. The move generator
2. The indexing functions (if the board size is different)

The solver goes through each permutation of possible material distributions (5 kings vs. 5 kings, 5 kings vs. 4 kings + 1 checker, 4 kings + 1 checker vs. 4 kings + 1 checkers, etc.) it places every piece on every possible square, generates the moves, looks up known results, solves what is possible to solve, and continues until it can make no more resolutions. So, just replacing the move generator should allow other games to be solved in the same way.

Well then I hope you will have interest in applying it to other checkers variants too. May be I or someone else can help with computation resources. But I guess you'll only look into it after completing your plan with 10-piece checkers.

How much data are you planning to actually share with people? (I saw many projects where the result is just the announcement "I solved endgame X in game Y", some statistics, and some long winning lines, and that's all. No database, no solver, no source, no online access, nothing. This is where we are with 7-piece chess endgames now.

)

Ed Trice · Post by **Ed Trice** » Thu Oct 28, 2010 5:05 pm

Kirill Kryukov wrote: Do you have 50 moves rule or something like that in checkers?

The venue for tournament checkers is not nearly what it is in chess. In checkers, they have a vaguely worded "40 move rule" where you simply must "show progress" towards a win, to be adjudicated by a referee if the need arises. The problem would be, in many instances a referee is not on par with the playing strength of those who tend to play out such endings.

In December of 2004, checkers World Champion Alex Moiseyev visited my house, and played against the 7-piece database. It should be noted, this was a win that required 253 plies of perfection. All along the way, one slip would turn it into a draw. Alex was not able to complete the win on the strong side, and when defending the loss, the database found a much shorter route to the win. It is safe to say, such incredibly complex positions are beyond the capabilities of the strongest human.

Kirill Kryukov wrote:It causes a lot of issues and debates in chess endgame solving, and limits the usefulness of Win-Draw-Loss tables. Also Nalimov tables (the most popular format in chess, distance to win, up to 6 pieces currently) are suffering from not accomodating this rule.

In my opinion, that rule was designed for human play, and it should remain in the domain of human vs. human competitions. What player would want to play out a 100-move ending in R + B vs. R, for example?

In computer vs. computer tournaments, I say wave the rule and let the programs play it out. Maybe a special set of rules would make sense.

1) If a program announces mate at any point before the 50 move rule would stop the game, the counter resets.
2) If the announcement was "Mate in X" and X + 1 moves elapse, the defending side is awarded the draw.
3) If "X" is ridiculously large and deemed a deliberate exaggeration to circumvent the 50-move rule, the declaring program is given a forfeited loss.

That should make the computer-computer endgames more interesting, and the tablebase generator effort would become more worthwhile.

Kirill Kryukov wrote: Interesting. A bit too verbose to my taste, but it's nice to see diagrams in a text log file. I also write solving report into one huge file, and statistics are saved into another single file (Actually one file for all N-piece endings). I'm not a fan of .tbs format where you have a thousand tiny files.

I did it mostly for debugging, and verifying the data was completely the same each time. Also, since I use the program as a benchmark, it is much harder for someone to fake a "fast time" since I have a program that validates the results for me.

Kirill Kryukov wrote:Do you mean you never compress the thing? Is it because the compression ratio is too low, or because it slows down the querying too much? My chess DTM files compress to about 3 positions per byte in DTM (distance to mate - this is what you call distance to win/lose) metric.

In checkers, you can load all of the relevant 6-piece and smaller endgames into 1 GB of RAM without compression. The lookup routine is much faster than any move generator, which has the nice side effect that probing the database NEVER slows down the engine using it. I worked with Jon Kreuzer on making a special version of his GUI Checkers program that probes my databases in one large RAM buffer. The performance was remarkable, averaging 20 million nodes/second on my 3.9 GHz Intel Core i7-860.

The run-length encoding scheme can't be used, since every single bit configuration in one byte has been exhausted. The longest win < 7 pieces is 253 plies, the value 254 was used for a draw, and 255 was "unknown", which is used at the beginning to designate positions that had not yet been solved. It worked out perfectly, fitting into one byte. There are no tokens available to use as compression bytes for consecutive runs of the same value.

For 8-, 9-, and 10-pieces, I used 10-bit bytes by overlaying the bits across 2-byte boundaries. That way, it's only 1.25 bytes per position (instead of 2.0) and the sequence repeats, making indexing rather easy. I have a large buffer filled with 10-bit entries. Since the longest win will probably not cross 500 plies by too much, if at all, I would have 500 tokens to use for run-length encoding, which could result in some nice compression in the end. But, I won't know for about a year.

Kirill Kryukov wrote: Well then I hope you will have interest in applying it to other checkers variants too. May be I or someone else can help with computation resources. But I guess you'll only look into it after completing your plan with 10-piece checkers.

I'd like to publish some papers on the results, the techniques, and the move generator itself when everything is done. Then, I will probably release the source code, so others could modify it for their own purposes.

Kirill Kryukov wrote: How much data are you planning to actually share with people? (I saw many projects where the result is just the announcement "I solved endgame X in game Y", some statistics, and some long winning lines, and that's all. No database, no solver, no source, no online access, nothing. This is where we are with 7-piece chess endgames now. )

You can get the 7-piece checkers tablebases without anything except the cost of the distribution media. Someone in the checkers world is in charge of that project, I forget who. I can look it up if you are interested.

The 8-piece checkers tablebases are about 475 GB. The 9-piece is around 2.4 TB. The 10-piece will be about 45 TB. It's just a matter of how to distribute it.

Post by **Kirill Kryukov** » Fri Oct 29, 2010 10:22 am

Ed Trice wrote:I did it mostly for debugging, and verifying the data was completely the same each time. Also, since I use the program as a benchmark, it is much harder for someone to fake a "fast time" since I have a program that validates the results for me.

Yeah the computation log is important to save. However when it includes timing data (as it should), it makes it difficult to verify that the results are the same. So I print out a separate timeless statistics for that purpose.

Faking "fast time".. No idea who would want to do that that, but.. Would not loading the entire log, and scaling down the times proportionally pass your validation?

Ed Trice wrote:
Kirill Kryukov wrote:Do you mean you never compress the thing? Is it because the compression ratio is too low, or because it slows down the querying too much? My chess DTM files compress to about 3 positions per byte in DTM (distance to mate - this is what you call distance to win/lose) metric.
In checkers, you can load all of the relevant 6-piece and smaller endgames into 1 GB of RAM without compression. The lookup routine is much faster than any move generator, which has the nice side effect that probing the database NEVER slows down the engine using it. I worked with Jon Kreuzer on making a special version of his GUI Checkers program that probes my databases in one large RAM buffer. The performance was remarkable, averaging 20 million nodes/second on my 3.9 GHz Intel Core i7-860.

Yeah, when it fits into RAM it should be fast, also for such tables you don't particularly need compression at all. The problems start with those huge tables. Normally you want to save disk space by compression, but then the compression and indexing scheme you can use depend on expected performance and probing strategy. I was just curious if you are going to compress the 10-piece tablebases.

Ed Trice wrote:The run-length encoding scheme can't be used, since every single bit configuration in one byte has been exhausted. The longest win < 7 pieces is 253 plies, the value 254 was used for a draw, and 255 was "unknown", which is used at the beginning to designate positions that had not yet been solved. It worked out perfectly, fitting into one byte. There are no tokens available to use as compression bytes for consecutive runs of the same value.

For 8-, 9-, and 10-pieces, I used 10-bit bytes by overlaying the bits across 2-byte boundaries. That way, it's only 1.25 bytes per position (instead of 2.0) and the sequence repeats, making indexing rather easy. I have a large buffer filled with 10-bit entries. Since the longest win will probably not cross 500 plies by too much, if at all, I would have 500 tokens to use for run-length encoding, which could result in some nice compression in the end. But, I won't know for about a year.

253 plies - very convenient! I was thinking about something like LZMA compression (I use it in my solvers). So that you save the data in blocks, always compress the entire block and decompress when reading. I guess that you should see at least 1.5-2 times ratio this way, which converts into a sizeable difference for 45 TB.

LZMA should certainly save more space than run-length.

Ed Trice wrote:I'd like to publish some papers on the results, the techniques, and the move generator itself when everything is done. Then, I will probably release the source code, so others could modify it for their own purposes.

Excellent plan, I'll look forward to see your papers and the the source code!

Ed Trice wrote:You can get the 7-piece checkers tablebases without anything except the cost of the distribution media. Someone in the checkers world is in charge of that project, I forget who. I can look it up if you are interested.

The 8-piece checkers tablebases are about 475 GB. The 9-piece is around 2.4 TB. The 10-piece will be about 45 TB. It's just a matter of how to distribute it.

Yeah, distributing a multi-TB dataset can be a pain. Releasing the source code is the best, then there are no limits to what others can do.

Can you already estimate the RAM requirement for your 10-pieces checkers solver?

Ed Trice · Post by **Ed Trice** » Fri Oct 29, 2010 11:25 am

Kirill Kryukov wrote:
Faking "fast time".. No idea who would want to do that that, but.. Would not loading the entire log, and scaling down the times proportionally pass your validation?

Yes, it could

The reason I made the log file so complicated was to discourage that. You would have to scale the times reported for every pass over the database, not just the end time that was reported.

Kirill Kryukov wrote: I was just curious if you are going to compress the 10-piece tablebases.

In the end, I will have to, definitely.

Kirill Kryukov wrote: 253 plies - very convenient!

I forgot to add one detail. During the solving run, a win of over 253 was found, and it had to be "improved" during a few iterations to settle down to 253. So, I had to use an "escape byte" to watch for overflows, and track them in a separate linked list. It made things more complex, but also was necessary for the 9-piece db especially, which has many wins > 253.

Kirill Kryukov wrote:I was thinking about something like LZMA compression (I use it in my solvers). So that you save the data in blocks, always compress the entire block and decompress when reading. I guess that you should see at least 1.5-2 times ratio this way, which converts into a sizeable difference for 45 TB. LZMA should certainly save more space than run-length.

Yes, the good part about the solving run is that I have lots of time to read about compression! Since I have 6 cores = 12 threads, I can implement a post-solving compression scheme at my leisure.

Kirill Kryukov wrote:Can you already estimate the RAM requirement for your 10-pieces checkers solver?

You are going to love this part

I can solve up to the 12-piece tablebases using as little as 1 GB of RAM with my technique. This is going to make a very good academic paper, I believe. The more RAM you have, the better, of course. In my case, I am using 12 GB on my 5.0 GHz machine (the i7-980X "Gulftown", it is very, very fast).

I load only the "most recently visited" positions into the RAM buffer. Along with the other data, there is an aging index that indicates when the position was last visited. If the buffer can hold N blocks, and your position has not been examined in the last (N * positions per block) + 1 times the buffer has been accessed, it is marked "on disk" and it "falls off the list."

It is a simple concept used in other RAM-aging algorithms, such as hash tables, and it can be adopted very easily for large tablebase solving attempts.

ernest · Post by **ernest** » Fri Oct 29, 2010 6:19 pm

Ed Trice wrote:...endgames for checkers...

Hi Ed,
Did you have contacts with Jonathan Schaeffer, or is your work on the subject completely independent?

Ed Trice · Post by **Ed Trice** » Fri Oct 29, 2010 7:01 pm

ernest wrote:
Ed Trice wrote:...endgames for checkers...
Hi Ed,
Did you have contacts with Jonathan Schaeffer, or is your work on the subject completely independent?

Hi,

Well, Jonathan was nice enough to mention me in his latest book, and on his website (see http://webdocs.cs.ualberta.ca/~chinook/thankyou/ ), but I did not work with him directly on his checkers solving effort. I did email him a few times as Gil and I got some results that were different than his (back in 2001) so I did have contact with him that way. We also chatted on the phone a handful of times, but not more than that.

I was just a hobbyist working on checkers stuff, since nobody else was doing Distance To Win. In a way, Distance To Win in checkers is harder to do than in chess. In chess, you know "when to stop", in checkers, you run the risk of creating an "infinite loop" since reversible king moves can do weird things to the DTW/DTL as you iterate. I think this is why nobody else has produced results with it, it is frustrating until you figure out the one trick needed to get it right

Post by **Kirill Kryukov** » Sat Oct 30, 2010 2:23 am

Ed Trice wrote:
Kirill Kryukov wrote:253 plies - very convenient!
I forgot to add one detail. During the solving run, a win of over 253 was found, and it had to be "improved" during a few iterations to settle down to 253. So, I had to use an "escape byte" to watch for overflows, and track them in a separate linked list. It made things more complex, but also was necessary for the 9-piece db especially, which has many wins > 253.

I'm not sure I follow here - in the usual retrograde analysis solving whenever you find a win, it's the shortest win for that position. When we are looking for wins in N we know that all wins in N-1 are already found and marked. What do you mean by "improving" the win? Is it something specific to checkers?

Ed Trice wrote:
Kirill Kryukov wrote:Can you already estimate the RAM requirement for your 10-pieces checkers solver?
You are going to love this part I can solve up to the 12-piece tablebases using as little as 1 GB of RAM with my technique. This is going to make a very good academic paper, I believe. The more RAM you have, the better, of course. In my case, I am using 12 GB on my 5.0 GHz machine (the i7-980X "Gulftown", it is very, very fast).

I load only the "most recently visited" positions into the RAM buffer. Along with the other data, there is an aging index that indicates when the position was last visited. If the buffer can hold N blocks, and your position has not been examined in the last (N * positions per block) + 1 times the buffer has been accessed, it is marked "on disk" and it "falls off the list."

It is a simple concept used in other RAM-aging algorithms, such as hash tables, and it can be adopted very easily for large tablebase solving attempts.

Hm.. I'll love if when I see it.

How large (approximatelhy, in number of positions) can be the largest single table in 10-piece checkers, 11-piece, 12-piece (if you know)? I tried the caching idea similar to what you describe. It performs great when the available RAM is slightly smaller than the actual amount required to keep the table being solved + required sub-tables. However when the difference becomes larger, solving slows down intolerably due to always needing new blocks from the disk. Perhaps you found some improvement.

Ed Trice · Post by **Ed Trice** » Sat Oct 30, 2010 4:37 am

Kirill Kryukov wrote: I'm not sure I follow here - in the usual retrograde analysis solving whenever you find a win, it's the shortest win for that position. When we are looking for wins in N we know that all wins in N-1 are already found and marked. What do you mean by "improving" the win? Is it something specific to checkers?

In checkers, when you have a position that was unknown but now a move you try leads to a win, it is not guaranteed to be the fastest win. Therefore, the Distance To Win will not correspond to the iteration number, and, therefore, you must revisit the position several times over before determining its "true" distance to win.

If you look at some of the data in my report.txt file you see things like this:

Code: Select all

Sunday, January 17, 2010 @ 22:34:34
DB 5, slice 7, (global slice 48): iteration 5 completed.
********************************************************

wins resolved this pass = 318705, losses resolved this pass = 29, draws resolved this pass = 1057.
win lengths improved this pass = 224711, loss lengths that changed this pass = 0.

total move pass wins = 2300954, total move pass losses = 713, total move pass draws = 70640.
cumulative win lengths improved = 574989, cumulative loss lengths that changed = 0.

In checkers, win lengths can be improved. Loss lengths can change as well. For this reason, it is possible to have your solver "loop forever" if you are not careful.

A diagram will show why this is the case.

From the position shown above, suppose it is examined for the first time. That is, the pieces were just placed on these squares, and the position is currently unknown. Let's say it is white to move. The white checker is one move away from crowning and becoming a king. It can crown onto 2 different squares, and both of those positions would be in the "2 kings vs. 1 king" database, which was already solved. The result can be queried, and sure enough, white wins, since red to move would lose in each case. White would then chose which move wins most quickly.

The move 8-4 leads to a red loss in 24 plies, so white appears to win in 25 plies.
The move 8-3 leads to a red loss in 24 plies also, so white thinks the position wins in 25 plies.

Next, the move 14-10 is tried. This move leads to the red king being jumped, easy enough for a human to see. But, on this first iteration, it will remain unknown. Why? The position for the other side to move after 14-10 has not been solved yet. It takes two iterations to resolve a win in 3 plies, and 14-10 is a win in 3 plies.

So, after this first iteration, the position is saved as a white win in 25 plies.

On the second iteration, it will be red to move. The position with the king on square 10 (instead of 14) will be set up. The red moves 2-7 would lead to a losing jump, as would 2-6, so that position can be resolved as a loss in 2 plies.

On the third iteration, the position in the diagram is reached. This time, when 14-10 is generated, the position is now known to be a 2-ply loss for red. The program can therefore change the status from "win in 25" to "win in 3."

Kirill Kryukov wrote:How large (approximatelhy, in number of positions) can be the largest single table in 10-piece checkers, 11-piece, 12-piece (if you know)?

The largest slice is typically the arrangement that has 1 fewer checker than "perfect symmetry" for at least half of the pieces being kings. Sometimes it is the most symmetrical arrangement with equal numbers of kings.

For the 10-piece database, one side has 5 pieces. If "at least half" were kings, it would be 3 kings + 2 checkers vs. 3 kings + 2 checkers for "perfect symmetry." For the distribution with 1 less checker for one side, it would be the slice with 3 kings + 2 checkers vs. 2 kings + 3 checkers.

How many positions are there?

(# of 3 checkers vs. 2 checkers positions) * (27 choose 3) * (24 choose 2) = (1018056)(2925)(276) = 821 876 608 800

Fortunately this number also matches Dr. Schaeffer's

http://webdocs.cs.ualberta.ca/~chinook/ ... piece.html

Look for the listing "3223" for this result.

For 11-pieces, 3 kings + 3 checkers vs. 3 kings + 2 checkers = 6 027 095 131 200
For 12-pieces, 3 kings + 3 checkers vs. 3 kings + 3 checkers = 36 652 173 958 400

Kirill Kryukov wrote:I tried the caching idea similar to what you describe. It performs great when the available RAM is slightly smaller than the actual amount required to keep the table being solved + required sub-tables. However when the difference becomes larger, solving slows down intolerably due to always needing new blocks from the disk. Perhaps you found some improvement.

Yes, I did. The improvement was in the area of "locality of reference", which describes how "far away" your "next" block is, compared to the one most recently accessed. I redesigned my indexing function to improve the locality of reference, so each time a piece "moves" the resulting index won't jump too far away. It took a lot of experimentation to do this, and even more math! But it was fun, and it made it worth it.

Post by **Kirill Kryukov** » Sat Oct 30, 2010 5:40 am

OK, now I see that you are not just starting with a lost position and taking the moves back one at a time.

Thanks for the explanation!

BTW, are you slicing based on leading/trailing checker rank, or with some more fine slicing, like using the exact placement of all checkers as a slice?

Ed Trice · Post by **Ed Trice** » Sat Oct 30, 2010 1:19 pm

Kirill Kryukov wrote: BTW, are you slicing based on leading/trailing checker rank, or with some more fine slicing, like using the exact placement of all checkers as a slice?

No, I do not have to subdivide my tablebases because of the buffering scheme I use. Here is a sample indexing function for 3 kings and 2 checkers vs. 2 kings and 3 checkers:

Code: Select all

unsigned long long get_10_piece_index_3K2C_AGAINST_2k3c(unsigned char wk1, unsigned char wk2, unsigned char wk3, unsigned char wc1, unsigned char wc2, unsigned char rk1, unsigned char rk2, unsigned char rc1, unsigned char rc2, unsigned char rc3)
{	
	unsigned long long index_function_value;

	unsigned long combined_checker_contribution_to_index;
	unsigned char 	adjust_rk1_index = rk1,
					adjust_rk2_index = rk2,
					adjust_wk1_index = wk1,
					adjust_wk2_index = wk2,
					adjust_wk3_index = wk3;

	index_function_value = 0;
	
	/***********************************************/
	/* order of placement on the board: wwrrrWWWRR */
	/***********************************************/

	combined_checker_contribution_to_index = 1 + get_05_piece_index_0K3C_AGAINST_0k2c(33-rc3,33-rc2,33-rc1,33-wc2,33-wc1);
		
	if(wk1 > rc1)
		adjust_wk1_index--;
	if(wk2 > rc1)
		adjust_wk2_index--;
	if(wk3 > rc1)
		adjust_wk3_index--;
	/**********************/
	if(wk1 > rc2)
		adjust_wk1_index--;
	if(wk2 > rc2)
		adjust_wk2_index--;
	if(wk3 > rc2)
		adjust_wk3_index--;
	/**********************/
	if(wk1 > rc3)
		adjust_wk1_index--;
	if(wk2 > rc3)
		adjust_wk2_index--;
	if(wk3 > rc3)
		adjust_wk3_index--;
	/**********************/
	if(wk1 > wc1)
		adjust_wk1_index--;
	if(wk2 > wc1)
		adjust_wk2_index--;
	if(wk3 > wc1)
		adjust_wk3_index--;
	/**********************/
	if(wk1 > wc2)
		adjust_wk1_index--;
	if(wk2 > wc2)
		adjust_wk2_index--;
	if(wk3 > wc2)
		adjust_wk3_index--;
	/**********************/
	/**********************/
	if(rk1 > rc1)
		adjust_rk1_index--;
	if(rk2 > rc1)
		adjust_rk2_index--;
	/**********************/
	if(rk1 > rc2)
		adjust_rk1_index--;
	if(rk2 > rc2)
		adjust_rk2_index--;
	/**********************/
	if(rk1 > rc3)
		adjust_rk1_index--;
	if(rk2 > rc3)
		adjust_rk2_index--;
	/**********************/
	if(rk1 > wc1)
		adjust_rk1_index--;
	if(rk2 > wc1)
		adjust_rk2_index--;
	/**********************/
	if(rk1 > wc2)
		adjust_rk1_index--;
	if(rk2 > wc2)
		adjust_rk2_index--;
	/**********************/
	/**********************/
	if(rk1 > wk1)
		adjust_rk1_index--;
	if(rk1 > wk2)
		adjust_rk1_index--;
	if(rk1 > wk3)
		adjust_rk1_index--;
	/**********************/
	if(rk2 > wk1)
		adjust_rk2_index--;
	if(rk2 > wk2)
		adjust_rk2_index--;
	if(rk2 > wk3)
		adjust_rk2_index--;

	index_function_value = _3K2C_AGAINST_2k3c_index(combined_checker_contribution_to_index, adjust_wk1_index, adjust_wk2_index, adjust_wk3_index, adjust_rk1_index, adjust_rk2_index);
	
	return (index_function_value - 1);	
}

And you also need these supporting functions:

Code: Select all

#define _3K2C_AGAINST_2k3c_index(Q,f,g,h,i,j)

((((Q)-1) * (807300))   + (((_3_same_pieces_subindex((f),(g),(h))) - 1) * 276) + (_2_same_pieces_subindex((i),(j))))

/*********************************************************************/
/*                                                                   */
/* General Indexing Function Combinatoric Macros                     */
/* ---------------------------------------------                     */
/*                                                                   */
/* Given there are N identical pieces of the same color, how many    */
/* ways can they be arranged uniquely on a game board, independent   */
/* of the size of the board?                                         */
/*                                                                   */
/* The answer involves producing a polynomial expansion of the form  */
/*                                                                   */
/* a + (b-1)(b-2)/2! + (c-1)(c-2)(c-3)/3! + .. (z-1)(z-2)..(z-N)/N!  */
/*                                                                   */
/*********************************************************************/

#define _2_same_pieces_subindex(a,b)    ((a) + ((((b)-1)*((b)-2))/2))
#define _3_same_pieces_subindex(a,b,c) ((a) + ((((b)-1)*((b)-2))/2) + ((((c)-1)*((c)-2)*((c)-3))/6))

Ed Trice · Post by **Ed Trice** » Sat Oct 30, 2010 1:31 pm

Kirill Kryukov wrote:OK, now I see that you are not just starting with a lost position and taking the moves back one at a time. Thanks for the explanation!

I am using both a forward move generator and a reverse move generator. That way, I can determine wins, losses, and draws on any given pass, provided there is enough information for the other side to move.

It is very important to resolve as many draws as you can as you go along. Otherwise, these positions "hang around" the entire time you solve the tablebase, and you have to test to see if they can be resolved on every iteration. The fewer of them there are, the faster your tablebase will solve.

Unmoving losses cannot guarantee correct DTW since the "Distance To Lose" could be subject to change as you go along. The tablebase must therefore iterate until both of these conditions are met:

1) No further wins, draws, or losses were newly determined
2) No prior wins or losses had their Distance change

Once this situation occurs, all of your remaining unresolved positions must be draws.

Post by **Kirill Kryukov** » Mon Nov 01, 2010 5:53 am

Interesting! I'm now curious how the performance of your solver scales with additional RAM vs increased CPU speed, using the same reference storage. Have you done any such benchmarks with some larger tables?

If IO becomes a limiting factor with some larger tables, even with 6 (or 12, 24) GB of RAM, then may be you can achieve even better results by adding slicing (probably at the expense of even more comlpex indexing function).

Ed Trice · Post by **Ed Trice** » Mon Nov 01, 2010 3:29 pm

Kirill Kryukov wrote:Interesting! I'm now curious how the performance of your solver scales with additional RAM vs increased CPU speed, using the same reference storage. Have you done any such benchmarks with some larger tables?

I've benchmarked just about every permutation of solving that I could think of

I first benchmarked solving the checkers tablebases when I could load the entire thing into available RAM. I did it that way, by using malloc() to return the exact number of bytes I needed, 1 per position. I solved the tablebase that way first. Then, I used my most-recently-seen buffer, even though I had enough RAM not to need it. I wanted to see what the overhead of constantly having to update the linked list contributed. I thought it would be small, perhaps 2% or 3%, but I was wrong! Managing such a buffer costs about 15% of your overall speed, on average. I did this for every table in the 7-piece set, and it was fairly constant across all of them. When I re-indexed for greater locality of reference, meaning I was reusing blocks better, this dropped to 9%.

Next, I wanted to measure the performance hit as a function of % of the tablebase I could load into RAM. I would compare this to the buffer performance only, since for the larger tablebases I could not load them entirely into RAM anyway. As you might expect, this varied greatly according to the hard disk I was using. I obtained some solid state drives (SSDs) and a less expensive Velociraptor 10,000 RPM drive from Western Digital.

I resolved the 7-piece set with 90% of the positions in the buffer, then 80%, 70%, etc, all the way down to 10%.

With the SSDs, the performance was awesome.

The 90% buffer turned in a time close to 95% of the fully buffered performance.
The 80% buffer turned in a time close to 93% of the fully buffered performance.
The 70% buffer turned in a time close to 91% of the fully buffered performance.
The 60% buffer turned in a time close to 86% of the fully buffered performance.
The 50% buffer turned in a time close to 81% of the fully buffered performance.
The 40% buffer turned in a time close to 75% of the fully buffered performance.
The 30% buffer turned in a time close to 71% of the fully buffered performance.
The 20% buffer turned in a time close to 68% of the fully buffered performance.
The 10% buffer turned in a time close to 65% of the fully buffered performance.

The results were surprising and much better than I had hoped for. The initial 9% price of using the linked lists really had redeeming qualities. So, even if I could only fit 10% of the tablebase in RAM, the overall slowdown vs. loading it entirely in RAM was 100% - (65% x (91%)) = 41%. The question became: Would a 41% slowdown with much simpler code be worth not having to manage making about 10,000 different subdivisions and manage them all piecemeal?

For me it was a no-brainer. Ed Gilbert had so many files that comprised his 10-piece database, I did not want to head in that direction. I knew I could "regain" much of my lost performance with hardware tweaking. The effective speed of the Gulftown I am using is 5.4 GHz, so it is 60% faster than a stock system.

Next, I started solving the tablebases, of course, and I observed something different, unexpected, and again even better than I had hoped for. Larger tablebases needing the buffering scheme turned in results better than this, even when the tablebase was not on an SSD. This puzzled me. How could some of these very large 9-piece slices outperform the SSD numbers when compared to the deliberately unloaded portions of the 7-piece tablebase?

Then it dawned on me. The larger tablebases do much more work, and make better use of the locality of reference, while their portion of the tablebase is in the buffer. The artificially created scenario with 7-pieces does not do as much work, so the blocks are not reused as much. The bigger the tablebase, the better the efficiency associated with the locality of reference. This is because when evoking the "unmover", with a greater number of moves being able to be resolved by moving back, more resolutions-per-loaded-block occur, and this has a nice cumulative effect over time.

Post by **Kirill Kryukov** » Tue Nov 02, 2010 2:41 am

Very nice scaling. Although, I'd be still tempted to try slicing to improve it even further.

Do you think that something similar to your "locality of reference" improvements could be applied to chess tables too?

Regarding the huge amount of little files - it is usually trivial to combine them into larger files, if the number becomes annoying. I am usually OK to tolerate about 6 digits number of files, but will start worrying about them when it goes over a million.

Ed Trice · Post by **Ed Trice** » Tue Nov 02, 2010 3:57 pm

Kirill Kryukov wrote:Although, I'd be still tempted to try slicing to improve it even further.

Maybe if you saw the code for using a large buffer without having to deal with RAM slices, you might be tempted to make it work the way I did

Kirill Kryukov wrote:Do you think that something similar to your "locality of reference" improvements could be applied to chess tables too?

Yes, it is independent of the game you are using. "Locality of reference" is one of those terms somebody put in an academic paper, surrounded by a few "big words", but it is simple in conception, really. It just means, as you solve your positions, make sure the way you "loop around the board" doesn't cause your indexing function to generate large offsets.

Imagine your pieces that are being moved around the board are actually digits. So, if you are in a 6-piece table, you have 3 digits for one side, 3 digits for the other side.

Which pieces should you move first?

The "low order" pieces where the indices increment by "1's", "10's", or "100's" should be moved first. If you move the other pieces first, you are having your indices jump by 1000's, 10000's, or 100000's (using digits as the example).

I experimented using 2 buffers, since consulting the "reflection" slice for the other side to move had the opposite effect, and I also experimented with different indexing functions for side to move to minimize this effect within the same buffer. If you test out all of these types of things, eventually you get maximum solving performance.

There's always a way to make something faster

Post by **Kirill Kryukov** » Wed Nov 10, 2010 6:08 am

This is the general idea of course. I was looking for your specific improvements, and if they could be applied to chess. Probably not. Depends on the indexing function of course.

So far the rule of thumb (for me anyway) is to try every possible slicing (by irreversible moves) before going into any further sub-divisions. Somehow you prefer to skip the slicing, and this intrigues me. But as you say the overall precedure you use for checkers is already much different than the linear retrograde solving used in chess. Perhaps it makes sense in your context.

Ed Trice · Post by **Ed Trice** » Mon Nov 15, 2010 3:58 am

Kirill Kryukov wrote:Somehow you prefer to skip the slicing, and this intrigues me.

Mostly subdivisions are avoided to create long "runs" for the "bitbases" in checkers. That is, the positions where we save only "win-loss-draw" information with no DTW. In checkers, every position with a jump for the side to move, or a jump for the side NOT to move, can be encoded as a "wildcard", since on the next move, a subdatabase conversion will happen. This makes the lookup procedure a little more complicated, but the compression results are awesome. Consider these data:

http://www.worldchampionshipcheckers.co ... isons.html

Gil Dodgen and I created the smallest version of the 8-piece database in the game of checkers.

From the 4 versus 4 set, we compressed it to 30.741 positions/byte, or 32,234,376 positions/Megabyte.

This is NOT using .zip or huffman, this was for REAL TIME probing with minimal decompression overhead.

We exceeded the next best published result by 4 positions per byte.

My bitbases are even better now, since I created longer runs with my indexing function, averaging about 36 positions/byte.

Post by **Kirill Kryukov** » Mon Nov 15, 2010 3:21 pm

Yeah, I get the "wildcard" idea, I think I've read about it somewhere. If I remember correctly the idea was to encode such positions using a value which is dominant in surrounding block of positions. Though I still don't completely connect to why it contradicts the slicing. To me it seems like two separate issues.

Ed Trice wrote:Gil Dodgen and I created the smallest version of the 8-piece database in the game of checkers.

From the 4 versus 4 set, we compressed it to 30.741 positions/byte, or 32,234,376 positions/Megabyte.

This is NOT using .zip or huffman, this was for REAL TIME probing with minimal decompression overhead.

We exceeded the next best published result by 4 positions per byte.

My bitbases are even better now, since I created longer runs with my indexing function, averaging about 36 positions/byte.

Having not tried to toy with checkers I can't comment, but I trust you that this is probably a good result. I experimented with compression in chess variants and I was happy with the ratio so far, but it comes at some expense of probing speed. (14 pos/byte for DTM and 44 pos/byte for DTC in 3x4 chess, but the extreme numbers are mostly due to the tiny board).

Do you know of any public data comparing probing speed of different checkers programs / databases?

Ed Trice · Post by **Ed Trice** » Tue Nov 16, 2010 2:24 am

Kirill Kryukov wrote: Do you know of any public data comparing probing speed of different checkers programs / databases?

My custom-tailored indexing functions and lookup routine is faster than the move generators in the game of checkers once a block is loaded into RAM. Therefore, the search speeds will not slow down when probing the databases.

I did give my 6-piece db code to Jon Kruezer, who is the author of GUI checkers. He did hook this up to his program, and he reports it gives a nice improvement over his 4-piece database version. I don't think he released that publicly yet. When he does, probably more authors will use my databases, and probing comparisons will be available. By loading all of the data into a 1 GB RAM buffer at the start, I can guarantee there is no faster access, even though my database is DTW. It is basically the same speed as a table lookup.

CCRL Discussion Board

6-piece checkers tablebases

6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases

Re: 6-piece checkers tablebases