Martin Kreuzer wrote:Dear Guido,
at present I have generated 38 of the 70 EGTB with 5-1 men,
including kqqppk with two pawns. Since I had trouble uploading to
Kirills ftp area, my uploads are somewhat behind.
I had so far two endgames which caused trouble: kbbbpk and
knnnpk. After taking two weeks for generating the positions,
knnnpk is now finally finishing. I will launch a similar "brute force"
attack on kbbbpk immediately afterwards.
It is very good that you are generating these EGTB, too. This will enable
us to compare the results. We can make statistics a la the *.tbs files
of the Nalimov EGTB. Moreover, we can check randomly chosen positions
and compare the "mate in ..." announcements.
I looked on your web page and did not see an explicit description of the
way you number the endgames. Are you also generating some illegal positions and then mark them as "broken" (or similarly)?
How much space per position do you allocate? Is you algorithm
easily describable?
I think it would be a nice project to create c (or c++) code for the
various numbering schemes. Also Nalimovs code could use a
major "clean-up". As far as I have seen, little research has been done
into the question what the most efficient numering scheme is when
one compares results _after_ compression.
Cordial greetings and best wishes,
Martin
Hi Martin,
My present situation is: 43 5-1men Tbs done and checked, and 27 still to do.
Unfortunately among the remaining TBs there are the longest to generate for gafs, i.e. kqrbpk, kqrnpk, kqbnpk and krbnpk. Each of these TBs will take about 4 days, 3 for generation and 1 for checking.
Your problem with kbbbpk and knnnpk is very strange but, if FEG doesn't consider the identical pieces, these endgames should have the max dimension and the max number of cycles. But it seems to me an insufficient explanation because gafs, generally much slower, spends about 60000 s of elapsed time in both cases.
Multiplying this time by 6 = 3! the time becomes 360000 s equivalent to 4d + 4h much less than two weeks.
In order to compare the stats I did a small program which convert my stats output in Nalimov's format, but my (pseudo)legal positions will be always less than Nalimov's ones. In my stats the number of broken position is not printed but easily obtained by the difference: total positions - calculated positions.
There is no problem to extract positions also in FEN format on the basis of the result or the combined results of White and Black.
The version 1.41 of my program has some bugs (not in the generation!) when the printing is requested in English notation. In the new version 1.42, not yet on my site, these bugs will be eliminated and the FEN format for the ASCII output will be added.
The numbering scheme used by gafs is based on the following sequence, each of which represents a sub-index of the global index:
Kings (462 without pawns or 1806 with pawns) treated in the program by tables
White pawn(s)
Black pawn(s)
White queen(s), rook(s), bishop(s), knight(s)
Black queen(s), rook(s), bishop(s), knight(s)
Pawns start from the 48 possible positions in the chessboard without keeping into account the positions of the kings. If a square was occupied by a King and a pawn, the correspondent byte in the TB is set to illegal (= broken in Nalimov definition). If you are interested to this subject, see on my site the function calcola(void) in the source of the program gafsdim where this point has been taken in consideration to calculate the number of possible positions more accurately (unfortunately the comments are in Italian).
The successive pieces are allocated keeping into account the squares already occupied by kings and pawns.
The presence of n identical pawns or pieces are treated together to form one sub-index with a reduction of the correspondent space in the TBs by the factor n!.
Example: kqqnpk = 1806 * 48 * (61*60)/2 *59 = 9,359,703,360.
As a consequence of this organization my TBs contain positions marked as illegal (broken) for the following
reasons:
- Opponent King under check.
- A King and a pawn on the same square.
- Positions equivalent for symmetry in respect to the main diagonal (a1- h8). This happens when the King are both on this diagonal.
- Triple, quadruple, etc. ( > 2) check and some type of impossible double check.
- Unreachable positions. This control is made executing one backmove and is optional, but it costs too much in cpu time and is incompatible with the method 3. I don't use normally this option.
Gafs starts using 8 bit per position but when the number of moves becames > 126, automatically the program adds one bit of carry to each position. The carry bits are written at the end of the files. If one bit is not sufficient 2 or more bits are added. I checked this algorithm successfully only for one bit during the generation of kppkp. I don't know if the program will run correctly when 2 or more bits must be added. I'm practically sure that some bugs will exist
.
The algorithms used are conceptually easy and have been described by Nalimov in a thread on CCC and by Aarontay for the method 1, and by Wu and Beal for the method 3 applied to the Chinese chess endgames. In method 2 I use the direct move in looking for losses and the backmove for finding wins.
The problems arise in the practical implementation where captures and promotions must be kept into account and when the dimensions of the TBs are greater than the RAM and therefore it is necessary to use the disk memory for storing the TBs during the generation, swapping data between RAM and disk.
Endgames with pawns are not more difficult than the others for my program, but they are in general bigger and request the previous generation of all the TBs obtained by promotion or capture.
Creating a C program for the different numbering schemes is not easy. The FEG format is not known while for the Nalimov format I remember to have been unable to decipher.
About space occupation of compressed files unfortunately I have no experience; I use compression only to transfer files. In fact the library gafslib for using my TBs uses only uncompressed files, with two possibilities:
- loading the whole file in RAM.
- reading a single result on the disk (randomly obviously!)
I think that if the number of the accesses to the TBs is relatively limited, the second mode could be sufficiently fast, adding that the result obtained by the TBs could be kept in the hash table.
My best wishes
Guido