Further compression
Further compression
Ok, thanks to your great answers in my other posts I got my tablebases working!
A short description:
-Positions with wtm and btm are stored in one file.
-duplicate squares are stored (in case of non-same pieces), but i think this is not very much
-use of unique triangle
-use of precomputed legal kingcombinations
-use of "same-piece" algorithm (p1 > p2 > p3 > p4)
-bitpacking
-datacompression in blocks of about 8192 bytes (depending on the need bits per positions)
-at the end of the file is the index for the positions of all compressed blocks, just an array of integers (which is datacompressed too)
Now for example my KQRK file is 760 KB (The two files of Nalimov are about 500 KB)
It has
- 4.620.288 positions
- 284.256 duplicates
- 720.435 invalid positions
I do not want to make the files as small as possible (being a Size Champion), but I if more compression is possible in an easy way I would like to do that.
Is there a smart way to eliminate a large part of invalid positions?
More ideas?
Thanks, Eric
A short description:
-Positions with wtm and btm are stored in one file.
-duplicate squares are stored (in case of non-same pieces), but i think this is not very much
-use of unique triangle
-use of precomputed legal kingcombinations
-use of "same-piece" algorithm (p1 > p2 > p3 > p4)
-bitpacking
-datacompression in blocks of about 8192 bytes (depending on the need bits per positions)
-at the end of the file is the index for the positions of all compressed blocks, just an array of integers (which is datacompressed too)
Now for example my KQRK file is 760 KB (The two files of Nalimov are about 500 KB)
It has
- 4.620.288 positions
- 284.256 duplicates
- 720.435 invalid positions
I do not want to make the files as small as possible (being a Size Champion), but I if more compression is possible in an easy way I would like to do that.
Is there a smart way to eliminate a large part of invalid positions?
More ideas?
Thanks, Eric
Re: Further compression
One way to avoid storing illegal positions is by storing positions in blocks and then filtering out illegal positions before compression.
When decompressing a block, use the same filtering function to put them back.
Would you publish your code for others to use? (under some free software licence)
When decompressing a block, use the same filtering function to put them back.
Would you publish your code for others to use? (under some free software licence)
Re: Further compression
That seems like a good idea. I will have a try at that.
The code needs some cleanup, but I'm certainly planning to put everything online somewhere as freeware, when it's finished (I guess somewhere within a half year)
The code needs some cleanup, but I'm certainly planning to put everything online somewhere as freeware, when it's finished (I guess somewhere within a half year)
Re: Further compression
Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?
Re: Further compression
After all the bitpacking was not succesful. It "confuses" the bytes so normal datacompression should be enough.
My KRQK file is now 663 KB. Coming closer to Nalimov (500)
KQQK is 379 KB, Nalimovs about 320 KB
I think I need one more last space-optimization and then I will be satisfied.
My KRQK file is now 663 KB. Coming closer to Nalimov (500)
KQQK is 379 KB, Nalimovs about 320 KB
I think I need one more last space-optimization and then I will be satisfied.
Re: Further compression
One more index optimization Nalimov uses is that it doesn't index unblockable checks from the stm
eg: KRK, white to move and black King on e4.
instead of having 63 left squares for the white rook, what he says is that no matter where the other pieces are standing, the rook may never be on d4,f4,e3,e5
this saves some space as well. Especially for the queen in your example.
eg: KRK, white to move and black King on e4.
instead of having 63 left squares for the white rook, what he says is that no matter where the other pieces are standing, the rook may never be on d4,f4,e3,e5
this saves some space as well. Especially for the queen in your example.
Re: Further compression
No, not run-length, it is removal of illegal positions:EricLang wrote:Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?
Code: Select all
int
encode(inbuf, outbuf, firstindex, count)
{
index = firstindex
outcount = 0
for i=1..count
if legal_position(index)
*inbuf++ = *outbuf++
outcount++
else
inbuf++
index++
return outcount
}
decode(inbuf, outbuf, firstindex, count)
{
index = firstindex
for i=1..count
if legal_position(index)
*inbuf++ = *outbuf++
else
*outbuf++ = ILLEGAL_POSITION
index++
}
Re: Further compression
Simple and smart, thanks!
I'll have to check if the Decode procedure isn't too timeconsuming, probably not.
I'll have to check if the Decode procedure isn't too timeconsuming, probably not.
Re: Further compression
Okidoki, I got it working! My files are much smaller now than Nalimov's! I'll make a list here soon for comparison
Re: Further compression
If I were trying to write my own generator, what would the easiest/most efficient language be?
Re: Further compression
For EGTB generators you need to write code that compiles as fast as possible and you need good access to memory management.notnale wrote:If I were trying to write my own generator, what would the easiest/most efficient language be?
I would suggest C++
Re: Further compression
My generator is written with Delphi (7). No problems there
Re: Further compression
Avoiding storing illegal positions looks rather promising, the more pieces, the more useful is my guess.
I'm Experimenting with Erlang (still learning) and for KRRvK I get:
[8325184,16777216,0.4962196350097656] legal for [white_rook,white_rook]
and for KRRvKR:
[508975152,1073741824,0.4740200489759445] legal for [white_rook,white_rook,
black_rook]
real 359m12.467s
user 355m20.060s
sys 0m12.925s
and for KRRRvK:
[387623040,1073741824,0.3610020875930786] legal for [white_rook,white_rook,
white_rook]
real 471m29.302s
user 469m55.926s
sys 0m0.252s
The last ought to be about the same proportion as for KRRRvKRR.
That means the bitmaps might be stored in about 64GB*0.36 is approx 23GB for endgames like KQRBvKQR.
The multiplier for just filtering adjacent kings and pieces on same square is 0.64 giving 41GB sized bitmaps.
To compute it I iterate over every square for every piece and count the legal positions with no two pieces on the same square.
I imagine computing illegal positions can be done much quicker for blocks of positions.
I'm Experimenting with Erlang (still learning) and for KRRvK I get:
[8325184,16777216,0.4962196350097656] legal for [white_rook,white_rook]
and for KRRvKR:
[508975152,1073741824,0.4740200489759445] legal for [white_rook,white_rook,
black_rook]
real 359m12.467s
user 355m20.060s
sys 0m12.925s
and for KRRRvK:
[387623040,1073741824,0.3610020875930786] legal for [white_rook,white_rook,
white_rook]
real 471m29.302s
user 469m55.926s
sys 0m0.252s
The last ought to be about the same proportion as for KRRRvKRR.
That means the bitmaps might be stored in about 64GB*0.36 is approx 23GB for endgames like KQRBvKQR.
The multiplier for just filtering adjacent kings and pieces on same square is 0.64 giving 41GB sized bitmaps.
To compute it I iterate over every square for every piece and count the legal positions with no two pieces on the same square.
I imagine computing illegal positions can be done much quicker for blocks of positions.
Code: Select all
legal_count(P)->
Pieces = [white_king, black_king] ++ P,
N = 1 bsl (6*lists:flatlength(Pieces)),
Count = legal_count(Pieces, N-1, 0),
[Count, N, Count/N].
legal_count(_Pieces, -1, Count)->Count;
legal_count(Pieces, Cursor, Count)->
Board = retro_board(Pieces, Cursor),
All_pieces = length(Pieces) == length(Board),
Illegal = is_illegal(white, Board),
if
Illegal or not All_pieces ->
legal_count(Pieces, Cursor-1, Count);
true ->
legal_count(Pieces, Cursor-1, 1+Count)
end.
Re: Further compression
I have written some C++ code for computing illegality/invalidity status of positions.
Using one core it does this in approximately 15min for a 7-man endgame. (simulated by 640*KRRRvK)
There are many opportunities for optimization left so it should be possible to do it quickly enough that it can be useful for compression during computation of the endgame for the purpose of reducing the amount read and written to disk.
Code is at http://teo.se/F.zip, it is a work in progress, lots of room for improvement.
Using one core it does this in approximately 15min for a 7-man endgame. (simulated by 640*KRRRvK)
There are many opportunities for optimization left so it should be possible to do it quickly enough that it can be useful for compression during computation of the endgame for the purpose of reducing the amount read and written to disk.
Code is at http://teo.se/F.zip, it is a work in progress, lots of room for improvement.