Page 1 of 1

7-man EGTB Bounty Reborn - File Format Discussion

Posted: Mon Apr 18, 2011 1:58 am
by Kirill Kryukov
The qualities of a good file format:

1. Contain all possible self-identification, including metric, indexing, partitioning, compression, which class of positions is contained. Some of this information can also appear in the file name.

2. Checksummed data. Data integrity check should not require any additional files. How large blocks of data should be checksummed is open for discussion.

3. Future proof. Addition of a new metric, compression format, or whatever, should not break the existing code. Of course the old code may not recognize the new tables, but at least it should not break.

4. Partitioning of the large tablebase files into volumes should be supported in all tools, as some systems and tasks may have problems with >100 GB files. For example, sharing large files may be difficult in some networks.

The basic idea is that with 100 GB files we don't need to worry about spending a few kB for meta-data. The process of generating the tables will probably take significant time, so as much as possible should be included to avoid drastic changes of the file format in the future.

How much of this should be formalized in the requirements? What should be required from the supported compression methods?

Please post your ideas of suggestions.