Page 1 of 1

Ironic yet wonderful.

Posted: Wed Aug 09, 2006 6:58 am
by jshriver
It seems after everything going on.. Mr Nalimov himself has released the missing egtbs. Mr. Hernandez has received them, so now begins an interesting anaylsis.

I'm working with him now to see if there are any differences.
I'm crossing my fingers, hoping, and praying that they are the same for the sake of the chess community.

If possible I'm going to try and collect both sets and see if there are any differences. But for now we're going to suffice with simple md5sums.

The first thing that pops in mind.. is even though the data might be the same, there might be differences in how they were split.

While that might be an easy fix.. it's a first step in assuring the data is identical.

As I learn more I'll gladly post here. Comments are appreciated.
As of right now however, the sets that I received are the only ones available.

Mr. Hernandez has noted he plans to send them to Kirill.

-Josh

Re: Ironic yet wonderful.

Posted: Wed Aug 09, 2006 8:49 am
by Pachnes
Hi,

I think we should stop distributing the "16 missing sets", until it is clear, that they are kongruent to the files, generated from Mr. Nalimov. But there is not much hope, that the split of the big files is identic. We will get a mismatch of files, when than the first file of Nalimovs original sets is puplished. What to you think about the case?

file splitting

Posted: Wed Aug 09, 2006 9:49 am
by Martin Kreuzer
Hello Thomas,

the file splitting for Nalimov tablebases is absolutely unique.
Please have a look at the thread "EGBT file splitting".
The uncompressed files are broken every 2^31 bytes
and then compressed individually. You can even get the
splitting program by Marc in that thread.

Greetings, Martin Kreuzer

Re: file splitting

Posted: Wed Aug 09, 2006 11:50 am
by mbourzut
Martin Kreuzer wrote:Hello Thomas,

the file splitting for Nalimov tablebases is absolutely unique.
Please have a look at the thread "EGBT file splitting".
The uncompressed files are broken every 2^31 bytes
and then compressed individually. You can even get the
splitting program by Marc in that thread.

Greetings, Martin Kreuzer
The biggest ambiguity is whether 8 or 16 bits are used to store tablebase values. This is hard coded in the index code, and is the reason why often index code needs to be updated when new files are released. I have used the index code available in late 2004. If this was changed later some endings could be affexted. The most likely ending that may be different is kbnkpp, which uses 16 bits per entry, even though 8 bits are sufficient. It is relatively straightforward to convert from 16 to 8 bits in this case.

-Marc

Posted: Thu Aug 10, 2006 4:40 am
by Kirill Kryukov
Nelson have sent me the TBS files for last 16 sets:

Reconciling the 'EN' and 'MB' versions of the last 16 EGTs

Posted: Thu Aug 10, 2006 6:30 am
by guyhaw
My understanding so far is that:

1) [via MB] the EGT-parts will be split at identical '2^31' points and are self-contained, so should admix together
2) [via MB and NH] there is an 8-bit/16-bit issue re at least KQPKBP & KQPKRP
MB converted from FEG in line with a 'late 2004' version of Eugene's access code: EN may have made some improvements (to 8-bit) since then
3) On the MD5sum-checking front, there are two issues. First, the MD5sums for EN's version of the last 16 DTM EGTs have not been computed by Eugene, or yet by Nelson. Secondly, MB advises that 'broken positions' may have their value set to 'broken' or to something that improves compressions further as this does not affect the chessic value of the files.

I would like to see a file-by-file comparison on the basis of MD5sums between EN's and MB's versions of the last 16 EN DTM EGT files. Obviously, an agreement at this level gives best confidence that no issues will arise, and we get the earliest opportunity to learn from the disagreements.
g