Sergey Yankovich's tablebase files

dcorbit · Post by **dcorbit** » Tue Feb 21, 2012 9:33 pm

I have exchanged email recently with Sergey Yankovich (author of this tablebase generator):
http://generatorchess.com/

Apparently, he intends to apply a Berkeley style license to the tablebase generator.
He also intends to release the probing code and possibly even the generator code.

Perhaps a coordinated effort could be mounted using the internet to create the entire 7 man tablebase file set.

The storage requirements seem a bit extreme right now, but I guess in a few years it will be quite affordable.

Codeman · Post by **Codeman** » Wed Feb 22, 2012 2:17 am

A Berkeley style license would definitely help the popularity of the project.

Very little is known about this generator so far. It would be nice if someone verified one 6 men by comparing with the Nalimov equivalent.

regarding the "Perhaps a coordinated effort could be mounted using the internet to create the entire 7 man tablebase file set."
It takes someone to take the lead similarly to Kirills great efforts regarding the 6-men task.
1) get people interested and dedicated to donate computing power (post on various boards)
2) distribute the work
3) coordinate the sharing of emule links

keep in mind that with this new scheme the first step will be to regenerate all the 6-men. In other words it will take a while, till we are back at the same level as we are with nalimov now. Anyway, that was inevitable as the Nalimov EGTBs are not meant to last given the generator is not suited for 7men and the licence is very restricted.

Dann, are you prepared to take up the coordination task?

Before starting I would appreciate if some of the other active egtb-experts were to review the generation code. Otherwise we might experience the case where, after having started with the first 7men, the generation source is released, someone finds for example a minor improvement to the compression algorithm and out of a sudden we have two competing formats around.

dcorbit · Post by **dcorbit** » Wed Feb 22, 2012 6:33 am

Codeman wrote:A Berkeley style license would definitely help the popularity of the project.

Very little is known about this generator so far. It would be nice if someone verified one 6 men by comparing with the Nalimov equivalent.

I have a friend who is going to compare several of the statistics files to see if the win/loss/draw/broken ratios are identical.

regarding the "Perhaps a coordinated effort could be mounted using the internet to create the entire 7 man tablebase file set."
It takes someone to take the lead similarly to Kirills great efforts regarding the 6-men task.
1) get people interested and dedicated to donate computing power (post on various boards)
2) distribute the work
3) coordinate the sharing of emule links

I know that it is a big effort. I do not know all that such an effort contains yet (e.g. in order to make child files, we have to be able to distribute all the parents somehow. E.g. you can't make KRNP-KRR without all the piece files with the appropriate piece replacing the letter 'P'). I do not know what sort of resources and time would be required but it is a very interesting project and if I do not coordinate I will no doubt participate. I am even thinking of buying a machine with a sole purpose of tablebase generation.

keep in mind that with this new scheme the first step will be to regenerate all the 6-men. In other words it will take a while, till we are back at the same level as we are with nalimov now. Anyway, that was inevitable as the Nalimov EGTBs are not meant to last given the generator is not suited for 7men and the licence is very restricted.

Dann, are you prepared to take up the coordination task?

Definitely not, until I fully understand the sort of effort that would be needed. A bad leader is worse than no leader.

Before starting I would appreciate if some of the other active egtb-experts were to review the generation code. Otherwise we might experience the case where, after having started with the first 7men, the generation source is released, someone finds for example a minor improvement to the compression algorithm and out of a sudden we have two competing formats around.

I have suggested to Sergey that he adopt Miguel's excellent and flexible compression scheme which is replaceable without having to recompute anything (though expansion and re-compression would be required). I do not know if he has contacted Miguel yet and I do not know if Miguel would be interested in this sort of participation.

Codeman · Post by **Codeman** » Wed Feb 22, 2012 10:13 pm

Another idea.

Most people interested in this project will already hold most if not all Nalimov 6men tables. It would save the whole project a lot of precious processing time and network bandwidth if a tool was built that can convert nalimov files to this new format. My bet is that such a tool would let the project finish a couple of months earlier for probably just a day or two of programming effort.

Dann, could you contact the author with this request?

syzygy · Post by **syzygy** » Thu Feb 23, 2012 4:25 pm

Codeman wrote:Most people interested in this project will already hold most if not all Nalimov 6men tables. It would save the whole project a lot of precious processing time and network bandwidth if a tool was built that can convert nalimov files to this new format.

A simpler solution would be to let the generator optionally use the Nalimov 6-men tables to seed the generation of the 7-men tables.

Codeman · Post by **Codeman** » Thu Feb 23, 2012 4:59 pm

syzygy wrote:
Codeman wrote:Most people interested in this project will already hold most if not all Nalimov 6men tables. It would save the whole project a lot of precious processing time and network bandwidth if a tool was built that can convert nalimov files to this new format.
A simpler solution would be to let the generator optionally use the Nalimov 6-men tables to seed the generation of the 7-men tables.

I disagree.
1) The parsing of Nalimov Files is a very resource hungry process (complex compression and index systems). Combing the Parsing of the one format with the generation of the other seems like a challenge.
2) We need the sub 6men files anyway, should the new format slowly replace nalimov.
3) There are many people who have not got any 6men nalimov egtbs. It is faster for one single person to convert and distribute, than for them to regenerate.

syzygy · Post by **syzygy** » Thu Feb 23, 2012 11:02 pm

Codeman wrote:
syzygy wrote:
Codeman wrote:Most people interested in this project will already hold most if not all Nalimov 6men tables. It would save the whole project a lot of precious processing time and network bandwidth if a tool was built that can convert nalimov files to this new format.
A simpler solution would be to let the generator optionally use the Nalimov 6-men tables to seed the generation of the 7-men tables.
I disagree.

1) The parsing of Nalimov Files is a very resource hungry process (complex compression and index systems).

Sure, but that means that the conversion process will require a large amount of resources as well. In addition, it is not at all clear that the new format is any more efficient. (Apparently the compressed files are already more than 20% larger than the nalimov files.)

(If your point is that the Nalimov probing code for 6-men needs a huge amount of RAM for indexing, then this can probably be worked around relatively easily by only initializing those tables that are necessary.)

Combing the Parsing of the one format with the generation of the other seems like a challenge.

It's a matter of adding probing code, but your proposed converter will need that code as well. Not much difference. (And for both the generator and the converter one would need Nalimov's permission, if I am not mistaken.)

2) We need the sub 6men files anyway, should the new format slowly replace nalimov.
3) There are many people who have not got any 6men nalimov egtbs. It is faster for one single person to convert and distribute, than for them to regenerate.

I thought your whole point was to cater for people that DID have the nalimov files. There is no difference between downloading files generated directly by the new generator or the same files converted from Nalimov tables.

Plus I think one would want to at least confirm that this new generator can generate the 6-men files completely correctly before embarking on 7-men generation.

syzygy · Post by **syzygy** » Thu Feb 23, 2012 11:25 pm

(Anyway, if a coordinated 7-men generation effort is started, it would seem wise to go for DTZ and not DTM. Then only WDL files are required for the subtables.)

h.g.muller · Post by **h.g.muller** » Fri Feb 24, 2012 9:35 am

If any effort is going to be made at all, I think it would be best to go for a novel 'distance to progress' metric. Capturing a piece of the losing sides, or pushing your own pawn is progress, but (forced) sacrificing your own pieces, or allowing the opponent to advance his pawns, in general is not.

I don't know if there are cases where you have to avoid 50-move draws by sacrificing your own material, or forcing the opponent to advance his pawns. I suspect they would be quite rare, if they existed at all. These cases could then always be recomputed in normal DTZ.

With such a metric you will avoid the behavior (to which the common public no doubt would take great offense) that the engine will start to sacrifice all the material it has in excess of the bare minimum as quickly as possible, then lets the opponent advance his pawns until they can barely be stopped, and only then starts to make an effort to win. It would be very hard to sell that as 'perfect play'.

Of course a consequence would be that you now do need the full metric for successor EGTs resulting from sacrifices or wrong Pawn advances, and can no longer do with the WDL info for those. On the up-side is that sacrifices almost never help. An rather ad-hoc poor-man's solution would be to somehow penalize counterproductive conversion / zeroing by setting (quite arbitrarily) assigning such conversions a DTZ of A=30 ('biased DTZ' or specifically DT30Z). That means the EGT would only 'go for them' if there isn't any alternative path to a favorable conversion that is shorter than A moves (suggesting that the 50-move rule could become a problem). If the resulting EGT would contain DTx values larger than 50, you can recalculate with lower A until they disappear (or set A=0 immediately, for pure DTZ). The assumption is that A=30 would be a good first try, as forcing a sacrifice is probably easier than forcing a gain (so you get 10 fewer moves to reach that goal, 20 vs. 30).

Some other remarks:

I think it would be a strategic mistake to consider EGTs with Pawns as a single EGT. They should be organized as Pawn slices, because many of the relevant pawn slices can be calculated without having the full set of successors, and it is mainly the totally irrelevant pawn slices (with many Pawns on 7th) that would prevent you producing the complete set of all P-slices. Especially the algorithm I sketched above should be applied P-slice by P-slice.

syzygy · Post by **syzygy** » Fri Feb 24, 2012 8:07 pm

h.g.muller wrote:With such a metric you will avoid the behavior (to which the common public no doubt would take great offense) that the engine will start to sacrifice all the material it has in excess of the bare minimum as quickly as possible, then lets the opponent advance his pawns until they can barely be stopped, and only then starts to make an effort to win. It would be very hard to sell that as 'perfect play'.

I see what you mean, but I think most of this behaviour can be corrected by performing a small search for winning moves that lead to a quick capture or pawn advance by the winning side.

For wins that can be found by search only, this should work perfectly fine.
For wins that cannot be found by search only because the winning sequence is too long, I suspect that in most cases a move making "progress" will be found by a small search close to the forced zeroing move that the DTZ-table would prefer.

I also note that even with your metric, a (dumb enough) engine will still sac its queen in order to go from an 8-men position to a 7-men position that it knows is won.

I think it would be a strategic mistake to consider EGTs with Pawns as a single EGT. They should be organized as Pawn slices, because many of the relevant pawn slices can be calculated without having the full set of successors, and it is mainly the totally irrelevant pawn slices (with many Pawns on 7th) that would prevent you producing the complete set of all P-slices. Especially the algorithm I sketched above should be applied P-slice by P-slice.

I'm not entirely sure what you mean, but generating a correct set of 7-men tables will involve generating all subtables and all successor P-slices. I doubt many will be interested in not entirely correct knowledge, even though that will be good enough for practical play.

Now we're putting our wishes on the table anyway, I would like to draw attention to the "DTZ50+"-metric (and accompanying WDL50+ "bit"bases) that allows recognising 50-move rule draws, but still manages to win those positions when ignoring the 50-move rule.

h.g.muller · Post by **h.g.muller** » Sat Feb 25, 2012 10:57 pm

syzygy wrote: I'm not entirely sure what you mean, but generating a correct set of 7-men tables will involve generating all subtables and all successor P-slices. I doubt many will be interested in not entirely correct knowledge, even though that will be good enough for practical play.

I am not talking about "not entirely correct knowledge". Just about incomplete knowledge.

If you had a tablebase that has all P-slices of KPPPKPP fully correct, except those where all white Pawns are on the 7th rank, and all black Pawns on the 2nd, do you really think there would be no interest in the correct slices? I can tell you the interest will be HUGE.

With DTZ in general you will not have to generate all successor P-slices. E.g. KQKP with the black Pawn on a2 is tricky, because it has promotion successors and in fact is not fully won. With wtm and Q on a1 it is trivially won, though. Knowing only that (and KQK) is enough to show that with P on a3 it is 100% won The DTZ in that P-slice could be wrong, becaue there might have been quicker winning zeroings by allowing the pawn to advance to a2 without your Q on a1, but every position is won. And for the DTZ of the slice with Pa4 only the WDL of the slice with Pa3 is needed. So you do get a fully correct DTZ for that. So KQKa4, KQKa5, KQKa6 and KQKa7 can all be completely calculated without having KQKa3 in DTZ (only in WDL), and without having KQKa2 (not even WDL, just a few wins) and KQKQ, KQKR, KQKB and KQKN not at all.

syzygy · Post by **syzygy** » Sat Feb 25, 2012 11:56 pm

h.g.muller wrote:So you do get a fully correct DTZ for that. So KQKa4, KQKa5, KQKa6 and KQKa7 can all be completely calculated without having KQKa3 in DTZ (only in WDL), and without having KQKa2 (not even WDL, just a few wins) and KQKQ, KQKR, KQKB and KQKN not at all.

But here you are relying on luck that you can only hope for in trivial (e.g. fully won) cases. It is KQKa2 that is the interesting table.

h.g.muller · Post by **h.g.muller** » Sun Feb 26, 2012 8:22 am

Nothing wrong with exploiting luck when it comes your way!

And yes, in the case of KQKP a2 is the interesting and relevant case. I only gave it as an example to show that imperfection of the knowledge does not always back-propagate through the EGT hiearchy. And this actually happens quite often. 100% won P-slices are very important for the more interesting slices, because they often are successors of those.

Interesting and relevant is not always the same. In multi-P end-games the P-slices that are most difficult to do because they need multiple promotions are those with many Pawns on 7th rank. But they are totally irrelevant, in the sense that the only P-slics to which they are successor are 100% won in other ways. When you can promote one Pawn, doing so is usually good for a win. Refraining from doing so, and marching up a second Pawn to 7th rank, in the mean time allowing the opponent to march up his Pawn to 2nd rank is a really bad idea. Now you suddenly need the 'interesting' KQQPKQP to prove that you can still win.

syzygy · Post by **syzygy** » Sun Feb 26, 2012 12:52 pm

Ok, so what you propose (I think) is generating tables with correct, but incomplete, information. For WDL, you would store values W, D, L, >=D, <=D, unknown. By generating only the more reasonable subtables and/or successor P-slices, you will end up with only a small portion of incomplete information.

Doing it this way, you could still calculate all P-slices. Those with many pawns on 2nd and 7th rank will only have somewhat more incomplete information.

DTZ can be calculated for the W and L positions, but will only be an upper bound (unless all the successor tables are known to be complete).

This can probably not, or at least not easily, be combined with information to correctly deal with the 50-move rule.

CCRL Discussion Board

Sergey Yankovich's tablebase files

Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files

Re: Sergey Yankovich's tablebase files