6-man EGTB collecting tip

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
Cato the Younger
Posts: 8
Joined: Thu Jan 05, 2006 3:43 pm
Sign-up code: 0
Location: McLean, VA, USA

6-man EGTB collecting tip

Post by Cato the Younger »

Collecting the entire set of 6-man EGTB is intimidating and impractical for most people who don't have the necessary 1.2 terabytes of free disk space on their hard drives. But there is a practical if imperfect solution to this problem: only collect the most important--that is to say, the most frequently occurring--combinations.

There are 365 possible 6-man combinations. If you collect just the top 20 of them, by my guess you will have over 56% of the 6-man situations that happen in engine vs. engine games. If you get ambitious and go for the top 50, you will have over 79%. But if you are super-inefficient and collect only the bottom 100 combinations you will have only 0.16% coverage! I mean really, what good is a KNNNNK tablebase to you unless you like horses?

Now there is one issue with this suggestion that I will let others elaborate: the "incomplete tablebases" problem. For example if you have a tablebase with pawns and don't have the one that it converts into following a pawn promotion, you could have some problems. But my general impression is that if you go for maximum coverage in statistical terms that's a better approach that getting all the pawnless combinations first, many of which hardly ever occur. KRRKBB for example happens less than once every 100,000 engine games!

According to my database, here are the top 100 most frequently occurring combinations, ranked in order. You really don't need much more than this, as these cover about 92% of all 6-man combinations that actually happen in engine games. (And for practical purposes you can toss out any 5+1 combinations I show here because if your engine can't solve those without help you need a new engine.)

KRP KRP
KRPP KR
KPP KPP
KRP KPP
KBP KNP
KBP KPP
KRPP KP
KRP KBP
KPPP KP
KQPP KP
KBPP KP
KBP KBP
KRB KRP
KNP KPP
KQP KQP
KQP KPP
KRBP KR
KNP KRP
KQRP KR
KRN KRP
KBPP KB
KNP KNP
KNPP KP
KBPP KR
KQP KRP
KRNP KR
KRPP KB
KQPP KQ
KQRP KP
KBPP KN
KQBP KP
KPPP KR
KNPP KB
KNPP KN
KQR KRP
KNPP KR
KQPPP K
KRPP KN
KRB KRN
KQRPP K
KQP KBP
KRPP KQ
KRPPP K
KQQP KP
KQPP KR
KRR KRP
KRRP KR
KPPP KB
KRBP KP
KQBPP K
KQNP KP
KQP KNP
KPPP KN
KRP KBN
KQQPP K
KQR KPP
KRB KBP
KRB KRB
KRB KPP
KQBP KB
KPPP KQ
KRBP KB
KRNP KP
KRBPP K
KBPPP K
KRN KRN
KRB KNP
KQB KQP
KBNP KR
KQPP KB
KQNPP K
KQRBP K
KQRP KB
KQB KPP
KRBP KQ
KQP KRR
KRN KPP
KBPP KQ
KQBP KN
KRN KBP
KQPP KN
KQBP KR
KRN KNP
KBNP KP
KQP KRB
KQN KQP
KPPPP K
KQRB KR
KRBP KN
KBN KBP
KQNP KB
KNPPP K
KNPP KQ
KQR KBP
KRRP KP
KQBP KQ
KRNPP K
KBNP KB
KQQ KPP
KQR KQP

Cato
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Post by Kirill Kryukov »

Hi Cato, thanks for the list, I'm sure it will be helpful to many people. I myself am aiming for the whole 1.2 TB, but still checking this list may be good to decide the order.

For completeness, here is one more list of endgame frequencies.
Cato the Younger
Posts: 8
Joined: Thu Jan 05, 2006 3:43 pm
Sign-up code: 0
Location: McLean, VA, USA

Re your link

Post by Cato the Younger »

I have no doubt the link you provided is accurate, but at first glance that does seem like the numbers would get from a human vs. human database. For our purposes I think you ought to stick to engine vs. engine games as they have a different profile. Human games are more frequently decided before you get to EGTB.
ryan_hirst
Posts: 7
Joined: Sat Feb 18, 2006 11:43 pm
Sign-up code: 0
Location: Seattle, WA
Contact:

does the incomplete TB problem still exist?

Post by ryan_hirst »

I'm using ChessBase software, and it seems to me that the newer revisions (say, 2002 forward) don't suffer from the incomplete TB problem. I do extensive analysis with a completely unmatched set of 6-man TBs and I have experienced no problems of this kind whatever with either Shredder 8 or Fritz 8.
If you have a TB position where a pawn promotion (which is the correct move) leads to a TB that is not present, all the programs I have used from Fritz 8 and newer WILL make that move directly from the TB and ONLY THEN exit the TB and search for the continuation.
The secondary problem (the engine not finding the winning line once that transitory move is made and the TBs are exited) is manageable:

1. Ignore ALL 5+1 endings as stated above. If you're using a program that can't mate with four pieces to a king without a TB file.....

2. Eliminate any overwhelming piece combinaions, regardless of endgame frequency. Take a look at the list. Number 7 is KRPP KP ?? Come on. You can scratch that one off and add the 101st most frequent one instead. Your engine will be able to tell without the TB if you are in one of the freakishly rare positions where a lone pawn can draw against a rook and two pawns. And it will be able to win the rest.
[The advantage to having these TBs is prophylactic. You engine knows it is winning by a HUGE margin... having the "sure win" endgames presolved still saves some search time, leaving the maximum possible time for scanning alternative possibilities.]

[also: 3. The more 6-man bases you have, consider burning your 3-5 man set to a DVD if you need the HD space. They will only get accessed once you're already in a TB. Searches from the missing 5+1 set will only occur if the winning side throws a piece away.... you get the idea.]

Now if only we could get all the 6-man bases onto solid-state drives.
Rafael B. Andrist
Posts: 11
Joined: Sun Feb 19, 2006 6:11 pm
Sign-up code: 0

Post by Rafael B. Andrist »

Some comments to ryan_hirst:

2. I think it is even better not to use these 6-men EGTB in search when there is a safe and easy win. In your [...]-argumentation you ignore the extra time needed for accessing the EGTB.

3. Accessing the 5-men on a DVD is very slow (compared to a HDD). This would only work if you have already all 6-men (ok, let us ignore the 5 vs. 1) and if there will be no access to 5-men during the search. At the moment, this is simply not possible. And IMO it is still not clear if the 6-men wouldn't slow down the engine too much anyway.

Rafael Andrist
kp1089
Posts: 10
Joined: Tue Jan 24, 2006 7:54 am
Sign-up code: 0
Location: Skamania, WA

Re: does the incomplete TB problem still exist?

Post by kp1089 »

ryan_hirst wrote:I'm using ChessBase software, and it seems to me that the newer revisions (say, 2002 forward) don't suffer from the incomplete TB problem. I do extensive analysis with a completely unmatched set of 6-man TBs and I have experienced no problems of this kind whatever with either Shredder 8 or Fritz 8.
Neither Shredder 8 or Fritz 8 can use 6-man tablebases, so I am not surprised that you have not seen a problem. Shredder 9, which can use them, will hesitate quite a long while before pushing the pawn, although I have not seen it opt for another move yet. Rybka and Crafty do not hesitate.

What you are seeing is the chessbase gui using the tablebases, not the engine.

kp
How small, of all that human hearts endure, that part that laws and kings can cause or cure. -- Samuel Johnson.
gambit3
Posts: 57
Joined: Mon Mar 06, 2006 8:06 am
Sign-up code: 0

some misinterpreted facts...

Post by gambit3 »

a long, long time ago, in a galaxy far, far away... oh wait, wrong saga. this is a seriously long message, so if you aren't in the mood for a balanced argumentative read, skip this one.
Neither Shredder 8 or Fritz 8 can use 6-man tablebases, so I am not surprised that you have not seen a problem. Shredder 9, which can use them, will hesitate quite a long while before pushing the pawn, although I have not seen it opt for another move yet. Rybka and Crafty do not hesitate.
this is in fact incorrect. all engines that can use tables can use single part 6 man tables. that is to say, all tables that do not have a number in their names.

the reason for the splitting of certain 6 man tables into parts was simply because those tables, as one part, break the 32-bit filesize limit of 2Gig. nothing more. as such, the following older commercial engines are ones i know for certain CAN use these single part 6 manners:

-fritz 7
-shredder 6
-gambit tiger 2.0
-chess tiger 14.0
-junior 7
-hiarcs 8
-rebel 11

crafty has always been able to use 6 manners. in fact, dr. hyatt had something to do with the file splitting scheme used in 6 man tbs, so crafty was the first engine to support multi part (split) tbs.
I do extensive analysis with a completely unmatched set of 6-man TBs and I have experienced no problems of this kind whatever with either Shredder 8 or Fritz 8.
If you have a TB position where a pawn promotion (which is the correct move) leads to a TB that is not present, all the programs I have used from Fritz 8 and newer WILL make that move directly from the TB and ONLY THEN exit the TB and search for the continuation.
here, you are assuming something that is not the case. with every chessbase gui since 7, if there is a tb position on the board, it is the gui, not the engine, that plays the move. all it takes to see this is for you to load an engine that you know does not access tbs (crafty 14.12, for example, available from dr. hyatt's ftp site) and turn on infinite analysis in any tb position to see how quickly it is solved. then go to exactly one halfmove before tb solution and try again. what you are seeing is the engine doing nothing until the gui jumps out of tb position. only then does the engine restart calculating. the reason for this implemetation is simple, logical and accurate: there is no point in running the engine when the position is known to conclusion.

and now the real reason why i write this: tb collecting.

people seem to be forgetting one thing, that every position in an ending for one colour is a diagonal mirror of the same position for the other colour. as such, every move is available from the tables of one colour only, meaning you can safely store one of the colour sets you have (unless you are generating your own tables) on tertiary storage (cd, dvd, etc.) logical implication is that you only need the tables from one colour.
practical implication is that your engine will have to analyse odd-numbered plies if it is not playing the side for which you have the tables, and even if it is. this is due to the way table access is constructed.

that said, even though the engine will only have access to half the tables, none of the move permutations are lost, meaning the tables still provide the best moves for positions, and on trial, i have noticed that usual is a one or two move (not ply) later established path to mate. draws take between 1 and four moves longer to establish.

tb cache usage is also halved, so you can put more into the same cache size, meaning less disc activity for tbs, meaning faster access, meaning long term faster solutions, assuming you don't only play one or two engine games at a time, even given the ''incompleteness'' of your table set.

so far i haven't said anything you can't see for yourself iff you look at and understand the tbgen source, which is available through a google search link.

i do have some serious doubts as to the practical application of every list that has been displayed or linked here (meaning on this forum in total). this is due to the fact that people have quoted how useful the tables have been with with endings adding up to 100%. this is entirely inaccurate on a number of counts:
- what path is taken? a game that goes to a 3 man base MUST have gone through a 4, 5, and 6 man position to get there, and second, pawned tables ALWAYS link to minimum 4 other tables, and that can mean a number of ways to get to the end position. take for example a krppkp position where the game ends in a 3 man solution. how many paths to get from 6 to three man are there? do these stats include the path through games as well as the termination games?
- why do all of the stats add up to 100% when multiple tb accesses could be in one game, as in the example above?
- there is (as yet) no relational information to make the decision which tables to download into anything more than what i would call an educated guess. for instance our krppkr position can go to a krppk, krpkr, kppkr, kqrpkr, krrpkr, kbrpkr, knrpkr position, or it can terminate. that's already 8 tables in use before the move is played, and that can all potentially be at ply 1, but which move is selected? HOW OFTEN does our krppkr position go to krpkr? what percentage is it? this information is used in the engine's search if you don't have the table, or if you use other than chessbase gui for silicon-based games, but does not end up in the game, simply because the game does not follow every possible line.

my personal solution to which tables to get is to simply collect all pawnless tables and store the tables for one colour offline. external hdds are not relatively expensive, are fast, have great capacities, and do not require the purchase of a new computer. 2 of these at 250G is enough to already have an almost complete black set, pawned, from 6 to 3 man or an almost complete pawnless set of 6 man.
i read some discussion that 5v1 is not useful, however my opinion is that while they may be cumbersome now, if positions get to 8 or 9 man tbs, they will prove their value then. they are, as such, a gamble at future investment, but they also let your engine know at a glance what not to do in an unbalanced ending, thus they also have some practical value, since sometimes knowing what not to do leads to knowing what to do, and as i stated earlier, the engine's search information as to which tbs it accessed to get to its conclusion of which move to play, is not included in the games, thus the stats given for this advice are completely inaccurate.

even 5 man tablebase size is halved when no pawns are added. it is not a given that your engine has to see a win that takes 34 moves to get the pawn across the board, but most engines will miss it less than 1%. aside from which, only 76% of silicon-based games get to a 5 or less man position, declining as table usage becomes more popular, so to cover 51% games (statistical balance of probabilities that you will cover the position), you need 66% of the tables at this time. this means that (by implication) 6 man tables are less of an advantage over 5 manners than people first thought, simply because the advantage is {1,12} plies, but only for around 3/4 of the games. this reduces the significance already.

my personal exprience is that around 85% of all silicon games are finished by move 50, often with 9 or 10 pieces still on the board. it is also my experience that if both machines manage to enter tables, the game will be played to conclusion, otherwise not. this last fact skews further any quoted statistics.

please make the statistics count. you can tell me that 1.3% goes to krppkr, or whatever the figure really is, since there even seems to be some dispute over this, but where does it go from there?

in short, no one is really equipped to tell anyone else which tbs are better to download. it becomes like your partner... you make your selection and stick to it. that will be so until a considerable time into the future yet, i think.

regards all

good luck and happy hunting
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Post by Kirill Kryukov »

Yeah this was a long post. Anyway, I want everything so I don't care for the order. I actually prefer to get *less* useful sets first, because they are rare, and they are in bigger danger of extinction. The more useful sets are more popular, so I will get them later easily. :-)
gambit3
Posts: 57
Joined: Mon Mar 06, 2006 8:06 am
Sign-up code: 0

Post by gambit3 »

my entire point was that there is no one ''more useful'' set when it comes to tables. as stated before, krppkr positions being searched by an engine will access every position with a pawn and 2 rooks, plus every position with 2 rooks, plus every position going to 5 men from these (all). that is WAY more than 1.3%! just because it doesn't end up in the game doesn't mean it isn't looked at. the stats for the ''most useful'' posted here are completely ridiculous and unfounded. to post ANY pawned table's occurrance rate, you would have to include its complete pawnless subset also to be mathematically accurate, and that has not been done, so these lists are, simply put, inaccurate and incorrect.

there are only a couple of useful collecting tips.

COLLECT COMPLETE PAWNLESS SET BEFORE GOING FOR PAWNED!!!
COLLECT ALL X SET BEFORE BEGINNING WITH X+1


this means work out what you want (which is subjective but generally has been misled by inaccurate ''information'' based on incomplete facts), then collect all the pawnless ones first, in ascending piece count order, then do the same for the pawned ones.

that's it. nothing else. no other advice is accurate or useful unless taken from every engine's scanning stats, and those are not in any databases.
those who can, do
those who can't, teach
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Post by Kirill Kryukov »

gambit3, there is one problem with this logic: Not everyone is ready to collect all 1 TB of 6-men. Myself, I want everything, but some people just have something like 300 GB external HDD for tablebases. For them collecting all pawnless is certainly not the most effecient way to go. From the practical perspective, performance of an engine equipped with set like KRPPKR, KRPKRP, KBPKBP, KBPPKR, KNPPKR, KRRKQP, KQPKQP, KPPPKR, KPPPKP, KPPPKN, KPPPKB will probably be better than performance of the same engine equipped with all pawnless. (Although EGTB advantage is hardly measurable in engine-engine tests). So for those with space limitation it makes perfect sense to make a list of tablebase importance and collect the sets starting from the top.

Now, how to construt that list is another matter. I don't think that simply counting endgames that actually occurred on board is the best way, as many endgames may be probed in search. So here I agree with you. Engines don't log which tb they probe, so it's hard to know. One simple way may be: Simply count all 7-piece positions in a game database. Then for each of them count all 6-men configurations that can result from it. So we count not only realized 6-men, but also potential. Then we can go one more capture back and start with 8-men counts. etc.. I don't know how far back is good to go. I saw an engine probing 5-men when there were 18 pieces on board, but that's too far for this study - it will just count everything.
gambit3
Posts: 57
Joined: Mon Mar 06, 2006 8:06 am
Sign-up code: 0

Post by gambit3 »

Now, how to construt that list is another matter. I don't think that simply counting endgames that actually occurred on board is the best way, as many endgames may be probed in search. So here I agree with you. Engines don't log which tb they probe, so it's hard to know. One simple way may be: Simply count all 7-piece positions in a game database. Then for each of them count all 6-men configurations that can result from it. So we count not only realized 6-men, but also potential. Then we can go one more capture back and start with 8-men counts. etc.. I don't know how far back is good to go. I saw an engine probing 5-men when there were 18 pieces on board, but that's too far for this study - it will just count everything.
i say again: there is no one more useful set. look at your own last sentence.
i also say again: knowing what NOT TO do can guide to what TO do.
third repeat: table selection is subjective.

gambit plays at playchess a partial pawnless 6 and is currently in top 10 for long games. it hardly ever plays blitz games, but when it does the tables seem to hinder rather than help. this also appears to be accross the board for 5 and 6 man tables.
i'll let the results speak for themselves. 2660 rating (currently #2) and no lost game in over 30 with inferior equipment. enough said.
those who can, do
those who can't, teach
Post Reply