The files of the endgame tables have very different sizes?

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
Kees
Posts: 6
Joined: Sat Oct 14, 2006 9:36 am
Sign-up code: 0

The files of the endgame tables have very different sizes?

Post by Kees »

Hello, I was wondering why the size of the files are so different.
Sometimes you have a set of files each 500mb or bigger (some even 1000mb). And recently i have downloaded it knnknp and they are only 200mb or less. It would be much easier if those "small" files would be much bigger because you have much less files (=better to see what you have and not have e.g)

Greetings Kees
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

EGT filesizes

Post by guyhaw »

The indexing of chess positions takes advantage of two symmetries:
a) 90-degree-rotations and reflections of the board - but only an a-h reflection if there are Pawns, and
b) the 'Like Men' symmetry which uses the fact that, eg in KNNKP, you cannot tell the difference between the Knights.
Thus, P-ful endgames tend to have 3.8x the positions of P-less endgames: KQQQKQ has ~6x fewer positions than KQRBKQ.

Further, Nalimov avoids placing men giving an 'unblockable check' to the side not to move. So, Q/R/B/N/P will not be found 'absolutely adjacent' to the sntm-K. This reduces index-size further, sometimes considerably.
Lastly, the range of depths involved affects the compressibility of the data. For example, if the EGTs were to the DTZ (depth to P-push, conversion or mate) rather than DTM metric, they would be less than half the size.
g
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

I think what he was asking was why are some files series (.0,.1,.2) will vary in size. Instead of just having 1 larger file, or having just 2 2GB files, etc. Not refering to the total differences in them in general.

If thats the case, the answer would be that before they are compressed they are just the .nbb/.nbw file, and they ARE all the same file size. The .emd extension would be the compression used, and they vary in size. Some compress to a smaller size than others and so forth.

I also believe that before the compression the files were 2GB size, due to file system limitations present in non-unix systems (FAT16) where the total drive size was limited to 2GB without special updates.

Correct me if I am wrong here with anything.

Derek
Kees
Posts: 6
Joined: Sat Oct 14, 2006 9:36 am
Sign-up code: 0

Post by Kees »

clocks wrote:I think what he was asking was why are some files series (.0,.1,.2) will vary in size. Instead of just having 1 larger file, or having just 2 2GB files, etc. Not refering to the total differences in them in general.

If thats the case, the answer would be that before they are compressed they are just the .nbb/.nbw file, and they ARE all the same file size. The .emd extension would be the compression used, and they vary in size. Some compress to a smaller size than others and so forth.

I also believe that before the compression the files were 2GB size, due to file system limitations present in non-unix systems (FAT16) where the total drive size was limited to 2GB without special updates.

Correct me if I am wrong here with anything.

Derek

You are right this is what i was wondering. Just why is the one file 1Gb and the other only 200MB

Thanks for the answers
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

Some filesize observations

Post by guyhaw »

Let's look at KRPKNN's and KRRPKQ's EGTs: 8+8 files in each case.
KRPKNN has 8.459,934,480 wtm and 8,141,507,232 btm index-entries, not all corresponding to 'legal' positions.
KRRPKQ has 8,112,305,064 wtm and 8,514,011,520 btm positions.
Compared with these numbers, note that 2^30 = 1,073,741,824,
2^31 = 2,147,483,648, and 8*2^30 = 8,589,934,592.
So we have less than 8*2^30 EGT-entries in each of the 4 cases. But both have maxDTM=253 which necessitates 16-bit cells in the EGT.
In fact, the EGTs are chopped into 2^31-byte pices = 2GB pieces: the last piece can be arbitrarily small, so nothing is learned by looking at its size.
If one has 16-bit cells, this means 2^30 index-entries per EGT-file, not 2^31. So there is a factor of 2 immediately.

I hope that after noting cell-sizes (8-bit or 16-bit), P-less/P-ful (a factor of 3.8), like men (a factor of 1, 2, 4 or 6 so far), the varying sizes of index-ranges and the varying degrees of compressibility, the varying filesizes will be more understandable.
g
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

It does make sense. Would make for some nice house cleaning if the files were put together into one .emd file though, then split it to 2GB each .emd file after the compression. Would cut down the total files by a great deal. Not to mention still staying under the 2GB file limitation, but how many people are really using FAT16 anymore storing their TB's? :)

I'm sure that this would not at all be reasonable since so many things are already coded in this way, but at the start of the 6-man files it would have been a good thought.

Derek
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

File compression

Post by guyhaw »

It's interesting to contemplate the +s and -s of compressing down to just below some filesize limit (FSL) rather than from just below FSL.
The uncompressed files have to exist at some point, so they have to be below FSL: I think there are things you can only do with uncompressed files now.
Also, the position-index must indicate easily which file to go to: this requires more in an endgame root-index if EGT-files accommodate different numbers of positions: maybe not a problem.
I think the Kadatch compression system creates a compression scheme per file-compressed, which seems to pre-empt the merging of compressed files.
Finally, it is easier to download the smaller files and store them on DVD.
I don't think the larger number of files is a big issue, and we are not near saying we have too many files to deal with. 7-man multiplies the number of files by ~60.
g
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

I agree, not a very big deal in the scheme of things. But, the file size really isn't mattering anymore, correct? This was only an issue with files only being able to be addressed up to 2GB. With newer file systems (or unix from the get-go), this wasn't ever an issue.

Am I wrong with any of the thoughts there?

Derek
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Post by Kirill Kryukov »

I agree that splitting files is not necessary on current systems. But too many engines are already made to use the EGTB files as they are, and I don't want to keep 2 copies of 6-man EGTB - in splitted and unsplitted forms. :-) Though I think this is a good point to consider when moving to 7-men tables.
Post Reply