Tablebase utility ranking

Mindbreaker · Post by **Mindbreaker** » Tue Dec 05, 2006 10:57 pm

While lists have been made ranking the 6-man tables for usefulness there has been the assumption that likelihood of appearance should be the determinate of utility. To me it is better to chose those tables that chess engines are more likely to loose their way without. I would very much like to see a list compiled by difficulty level for engines that was scientifically derived.
A programmer could perhaps have 1000 random positions from each of the 6-man endings collected and give an engine one second to find the best move in each. Ideally, the choices made would also be assessed and those that would have compromised the result against another engine with the table noted. It might also be good to perform the test on more than one engine say the top ten or so and have a separate table need ranking for each.

Norm Pruitt · Post by **Norm Pruitt** » Wed Dec 06, 2006 12:56 am

I like the idea, good thinking.
Welcome to the forum.
Norm/PAKman

Mindbreaker · Post by **Mindbreaker** » Wed Dec 06, 2006 6:48 pm

Thanks.
There are two main reasons one might want such a list, I think:

1. Many people don't have 1.2Tb available so to get the most out of the space they do have they would download the tables that are most useful.

2. To leave out some tables intentionally to actually improve engine play. I can't assume this is possible--there is a lot of room for debate and tests.

The idea is that tablebase lookups do take time. If an engine assesses the situation a table is for correctly and plays it correctly without the table there theoretically could be a disadvantage to including that table unless it is needed for completeness for adding 7-man when that becomes viable.

guyhaw · Post by **guyhaw** » Thu Dec 07, 2006 9:55 am

The concept of 'EGT Utility' can be defined and I suggest that one definition is Pr[Endgame Occurence]*Pr[Engine sheds theoretical value without it].
It is somewhat rather unsatisfactory as some engines are better than others at the endgame: this suggests the idea of an 'Endgame Tourney' without EGTs.
At the extreme end of non-occurence, Tim KrabbÃƒÂ© reports that his game database features no serious cases of same-coloured Bishops on the same-coloured square. Don't know if any endgame studies do.
The Kramnik-DF10 match raised the question of the value of the 'special endgame rules' in VK's favour. In which endgames would VK have found it most difficult to defend a theoretical draw? Game collections show that KRBKR has been difficult to defend: KmPKm with m=Q/R/B/N cannot be easy either. Fortunately, the audience were never frustrated by this scenario arising.

The 'Reference Fallible Endgame Player' agent, defined in past papers listed at http://www.tinyurl.com/law6k, can be used to estimate (via experiment) the probability of shedding a win, and can be used to calculate the theoretical value of losing a win via a Markov model. This has not been done yet. The question would be "What is the probability of R(c) shedding a depth(d) win en route to the win?"
g

[/i]

guyhaw · Post by **guyhaw** » Thu Dec 07, 2006 11:46 pm

To relay an answer to my own question about same-colour, same-colour square Bishops in studies, Harold van der Heijden kindly reports that there are:
- 231 (out of ~70,000) studies with such Bs in the main line
- 037 of these have such Bs in the initial position (usually but not always regarded as 'poor form', as in, e.g., the Troitsky with 5 such Bs)
- 075 more studies if sublines are also considered
Harold is the 'keeper of the book' on endgame studies, a real genius. Buy his Study Database CD if you haven't already ... http://home.studieaccess.nl/heijd336/home.html
g

Mindbreaker · Post by **Mindbreaker** » Fri Dec 08, 2006 12:43 am

To some degree it is true that not all engines are equal in endgame, but this does not dramatically harm my proposal. Many of the 6-man endings are so imbalanced that most 1200elo players can avoid botching if they if they can recognise a stalemate before it occurs. Consider that in 5-1 tables it is a lone king verses five men. Any chess engine that can't win that is profoundly weak unless the condition resulted from a forcing series of moves leading to stalemate immediately but that too should be easily seen by most chess engines.
It is very easy to get burned by probability. There are so many endgames possible that there is a good chance oddball endgames are going to crop up. The same one several times may be rare, but collectively the probability that rare ones are reached becomes high. Thus if you collect strictly by probability and are limited to 200 or 300Gb, you could still get burned a lot.

On the other had if you collected on the basis of a botch percentage the ones you left out are less likely to cost you points.

I looked for such a list a few months ago when I decided to start downloading 6-man tables. I only found the probability of occurrence lists. In the absence of what I wanted to go by, I just considered each material balance and made an educated guess as to what each side's plan would be and how much resistance is possible. I tried to error on the side of caution including more tables. I am only 1900elo or so and only spent 10 seconds or so considering each scenario, so no doubt, a better list is possible but this is the list I am using for now. The tables are in no particular order of usefulness, it is just a list of tables that appeared on the surface to be useful--the ones not listed being of doubted utility. My starting point was the "all" lists on the Tablebase Sharing site http://kd.lab.nig.ac.jp/chess/tablebases-online/ An International Master or Grandmaster could likely make improvements and I would welcome it. I will attach the list (The Breaker list) at the bottom.

A list made by humans could still have errors though realistically not many but it would be poor at ranking by utility which is important because not everyone has the same amount of space. If you do not have room for my listed table basses (The Breaker List) which ones do you omit? The ranking of utility is the most reasonable way to decide.

I think a direct multiplication of chance of occurrence by botching chance is not really the way to go because the cost of exclusion at the bottom of each list are not comparable. I would rather have all the tables that have any chance of botching rather than leaving something out just because it is highly improbable. Even with a several million game set we don't get an accurate picture at the bottom of the list of relative occurrence. Statistically, we can't really say with much certainty that one that appeared 2 or 3 times is really any more probable than one that occurred once or no times. To make such a judgment, you would need at least 30 of every type which may take several hundred million or a billion game database to reach.

Accurate ordering of botching risk is much easier to achieve. If 1,000 positions does not yield 30 botched, one can simply try 10,000 positions. If there are none, it is probably botchproof using that engine.

The actual method of calculation of the botch risk I propose would be as follows:

A chess engine is given 1,000 positions at random, it is given one second to reach a move choice. The choice is then compared with the table. Each position has a real evaluation from the table say for example mate in 15 and four moves that achieve that. If the engine suggests one of these four it gets a check and goes to the next. If it gets it wrong it either is suboptimal or a botch (changes the outcome assuming perfect play). If it is a botch, it gets a check for a botch and moves on to the next position. If it is suboptimal the number of moves off it is is recorded and placed in the category for the original length.

When we have all the data, we can then estimate the botch probability. The suboptimal moves can lead to a botch. To determine the chance of this a series of pseudo games can be run. This is the sort of thing a computer can do very fast.

The pseudo games would abide by the 50-move rule. One of the thousand positions would be taken at random and its outcome used i.e. if it was a mate in 35 and turned into a mate in 43 then it is replaced with a random mate in 43 that was one of the thousand positions already one second engine evaluated. This continues with table playing one side and the 1000 position engine result playing the other side. Positions are not actually played out because of the random replacement but the results should simulate it well and generate data fast.

In the list of results there should be each 6-man tablebase, engine used, botch per thousand, and double botch per thousand (turning a win not just to a draw but to a loss), time (I suggest one sec), and base set number (1,000--though it would be nice to have a larger number).

I would not disregard probability of occurrence entirely. The best plan is to include the top 20 or 30 by probability and then add whatever one has room for out of a botch chart starting with those most easily botched. Until this is generated go with a list like I have made.

A savy engine programmer might have his engine steer toward endgames that are low probability yet easily botched taking advantage of engines that are not equipped to handle them. It is hard to see how to exploit the engine equipped with all the tables of the positions it botches.

guyhaw · Post by **guyhaw** » Fri Dec 08, 2006 6:42 am

Some things I forgot to say ...
In the Haworth-Andrist work with 'RFEP' Referance Fallible Endgame Players, the reason we didn't investigate the probability of drawing when starting with a win was this. It is easier to avoid a 'finite depth draw' than a depthless (ignoring position-repetition) draw.
A 'depth draw' seems a counter-intuitive idea but is this. If the defending side can only achieve a draw with a forced sequence of N moves leading to conversion, the depth is N: the minimaxing is done assuming that one side is avoiding the draw and the other side is not. There are depth draws in the DTC and DTZ (etc) metrics, but none in DTM terms of course.
There are no EGTs distinguishing between depth-draws and depthless-draws, so this experiment is 'hanging' for the moment.

Your point about small numbers at the tail of the distribution of game-occurence is well made.
Note that EGTs to the DTZ metric are (for 3- to 6-man endgames) reckoned to be about half the size of the DTM EGTs we have now. This is because the compression works better on the smaller ranges of depths.
Bitbases provide further compression but only protect against shedding theoretical value: they don't necessarily point the way to the win.
g

Mindbreaker · Post by **Mindbreaker** » Thu May 03, 2007 12:21 am

I recently added another hard drive to my computer and consequently am revisiting the idea of a utility ranking. It occurred to me that there are other factors beyond the probability of an endgame's occurrence, and the likelihood of an engine botching the endgame without the table; we can't ignore size itself.

Consider, if we select a tablebase because it has a reasonable chance of being botched and it has say a 1 in 500 chance of occurrence but it is 10GB while we could have 3 other smaller tables collectively being 10GB with only half that chance of occurrence individually but the same risk of a botching, I think it is clear that the choice of the three over the one is preferable.

I also made a short list of endings with relatively large probabilities of occurrence in engine games (from kasimp's post in this endgame forum: http://kd.lab.nig.ac.jp/chess/discussio ... c.php?t=74), yet seem to me to have fairly low probabilities of a top level engine botching:

7 kqppkp 265 2.670%
12 krppkp 199 2.005%
20 knppkp 131 1.320%
27 kqrpkr 96 0.967%
31 kqrkrp 84 0.846%
33 kqbpkp 80 0.806%
34 kqrpkp 79 0.796%
35 krppkb 77 0.776%
36 kqqpkp 70 0.705%
37 kqpkbp 65 0.655%
38 krbpkp 59 0.594%
39 kqppkr 58 0.584%
42 krppkn 55 0.554%
46 kqqppk 44 0.443%
48 kqbppk 43 0.433%
49 kqpppk 43 0.433%
50 kqnpkp 42 0.423%
51 kqrppk 40 0.403%
52 krpppk 40 0.403%
54 kbpppk 32 0.322%
58 kqppkb 30 0.302%
59 krbpkb 30 0.302%
61 krnpkp 28 0.282%
63 krrpkr 27 0.272%
64 kbnpkp 26 0.262%
69 kqqkpp 22 0.222%
72 kqqbpk 21 0.212%
75 krbppk 20 0.201%
78 krnkrn 20 0.201%
80 kqnppk 19 0.191%
81 kqbpkr 19 0.191%
88 kqbpkb 17 0.171%
89 kqnpkb 17 0.171%
91 kqrbkr 16 0.161%
92 kqrbkp 16 0.161%
93 kqrnkr 16 0.161%
94 kqrpkb 16 0.161%
96 kqqpkq 15 0.151%
102 krbpkn 14 0.141%
103 krnpkb 14 0.141%

Choosing not to load these may lead to better play and analysis by chess engines because it frees room for more vital endgame databases.

If you feel I have mistakenly added or omitted anything in the >.140% area donÃ¢â‚¬â„¢t hesitate to comment. Or if you really think I should be more thorough and perhaps weed out those of lower probability let me know.

The reason I only went down to .140% was that at that level one would have already included 65 tablebases which in practical terms is quite a lot to load unless you have very fast hard drives and/or a ton of ram. Also, the set of games the statistics were derived from was only 22,949 games.

I am not saying we should all delete these tables just place them in another folder (cut-paste is far faster than copy-paste) and have your engines load only the remaining filesÃ¢â‚¬â€

Mindbreaker · Post by **Mindbreaker** » Thu May 03, 2007 12:36 am

corrected

Mindbreaker · Post by **Mindbreaker** » Thu May 03, 2007 12:41 am

corrected

guyhaw · Post by **guyhaw** » Thu May 03, 2007 7:21 am

It is also true that, for these one-sided EGTs, a DTZ [Depth to (move-count) Zeroing (move)] EGT would be much more compact than a DTM EGT.

As to the set of test-positions, should these be restricted to game positions, or should the considerably corpus of studies in each endgame be included?
g

Mindbreaker · Post by **Mindbreaker** » Sun May 06, 2007 5:07 pm

It is also true that, for these one-sided EGTs, a DTZ [Depth to (move-count) Zeroing (move)] EGT would be much more compact than a DTM EGT.

I don't really know.

As to the set of test-positions, should these be restricted to game positions, or should the considerably corpus of studies in each endgame be included?

There are many artificial positions used in endgame puzzles and studies, and so some restriction seems on the surface to be reasonable, but I am not sure it is, with the exception of one side having two bishops on the same color. With only 6 men, virtually all statistically distorting effects of the initial position should be infinitesimal.

In the future your concern may be warranted in say 10-man tables or something where pawn structures could become very unnatural in purely random positions.

CCRL Discussion Board