Further compression

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Further compression

Post by EricLang »

Ok, thanks to your great answers in my other posts I got my tablebases working!

A short description:
-Positions with wtm and btm are stored in one file.
-duplicate squares are stored (in case of non-same pieces), but i think this is not very much
-use of unique triangle
-use of precomputed legal kingcombinations
-use of "same-piece" algorithm (p1 > p2 > p3 > p4)
-bitpacking
-datacompression in blocks of about 8192 bytes (depending on the need bits per positions)
-at the end of the file is the index for the positions of all compressed blocks, just an array of integers (which is datacompressed too)

Now for example my KQRK file is 760 KB (The two files of Nalimov are about 500 KB)
It has
- 4.620.288 positions
- 284.256 duplicates
- 720.435 invalid positions

I do not want to make the files as small as possible (being a Size Champion), but I if more compression is possible in an easy way I would like to do that.

Is there a smart way to eliminate a large part of invalid positions?
More ideas?

Thanks, Eric
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: Further compression

Post by koistinen »

One way to avoid storing illegal positions is by storing positions in blocks and then filtering out illegal positions before compression.
When decompressing a block, use the same filtering function to put them back.

Would you publish your code for others to use? (under some free software licence)
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

That seems like a good idea. I will have a try at that.

The code needs some cleanup, but I'm certainly planning to put everything online somewhere as freeware, when it's finished (I guess somewhere within a half year)
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

After all the bitpacking was not succesful. It "confuses" the bytes so normal datacompression should be enough.
My KRQK file is now 663 KB. Coming closer to Nalimov (500)
KQQK is 379 KB, Nalimovs about 320 KB
I think I need one more last space-optimization and then I will be satisfied.
Codeman
Posts: 85
Joined: Fri Oct 19, 2007 7:50 pm

Re: Further compression

Post by Codeman »

One more index optimization Nalimov uses is that it doesn't index unblockable checks from the stm

eg: KRK, white to move and black King on e4.
instead of having 63 left squares for the white rook, what he says is that no matter where the other pieces are standing, the rook may never be on d4,f4,e3,e5
this saves some space as well. Especially for the queen in your example.
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: Further compression

Post by koistinen »

EricLang wrote:Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?
No, not run-length, it is removal of illegal positions:

Code: Select all

int
encode(inbuf, outbuf, firstindex, count)
{
  index = firstindex
  outcount = 0

  for i=1..count
    if legal_position(index)
      *inbuf++ = *outbuf++
      outcount++
    else
      inbuf++
    index++

  return outcount
}

decode(inbuf, outbuf, firstindex, count)
{
  index = firstindex
  for i=1..count
    if legal_position(index)
      *inbuf++ = *outbuf++
    else
      *outbuf++ = ILLEGAL_POSITION
    index++
}
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

Simple and smart, thanks!
I'll have to check if the Decode procedure isn't too timeconsuming, probably not.
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

Okidoki, I got it working! My files are much smaller now than Nalimov's! I'll make a list here soon for comparison :)
notnale
Posts: 43
Joined: Sun Aug 17, 2008 6:36 am

Re: Further compression

Post by notnale »

If I were trying to write my own generator, what would the easiest/most efficient language be?
Codeman
Posts: 85
Joined: Fri Oct 19, 2007 7:50 pm

Re: Further compression

Post by Codeman »

notnale wrote:If I were trying to write my own generator, what would the easiest/most efficient language be?
For EGTB generators you need to write code that compiles as fast as possible and you need good access to memory management.
I would suggest C++
EricLang
Posts: 33
Joined: Wed Sep 03, 2008 5:31 am

Re: Further compression

Post by EricLang »

My generator is written with Delphi (7). No problems there :)
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: Further compression

Post by koistinen »

Avoiding storing illegal positions looks rather promising, the more pieces, the more useful is my guess.
I'm Experimenting with Erlang (still learning) and for KRRvK I get:
[8325184,16777216,0.4962196350097656] legal for [white_rook,white_rook]
and for KRRvKR:
[508975152,1073741824,0.4740200489759445] legal for [white_rook,white_rook,
black_rook]

real 359m12.467s
user 355m20.060s
sys 0m12.925s
and for KRRRvK:
[387623040,1073741824,0.3610020875930786] legal for [white_rook,white_rook,
white_rook]

real 471m29.302s
user 469m55.926s
sys 0m0.252s

The last ought to be about the same proportion as for KRRRvKRR.
That means the bitmaps might be stored in about 64GB*0.36 is approx 23GB for endgames like KQRBvKQR.
The multiplier for just filtering adjacent kings and pieces on same square is 0.64 giving 41GB sized bitmaps.

To compute it I iterate over every square for every piece and count the legal positions with no two pieces on the same square.
I imagine computing illegal positions can be done much quicker for blocks of positions.

Code: Select all

legal_count(P)->
    Pieces = [white_king, black_king] ++ P,
    N = 1 bsl (6*lists:flatlength(Pieces)),
    Count = legal_count(Pieces, N-1, 0),
    [Count, N, Count/N].

legal_count(_Pieces, -1, Count)->Count;
legal_count(Pieces, Cursor, Count)->
    Board = retro_board(Pieces, Cursor),
    All_pieces = length(Pieces) == length(Board),
    Illegal = is_illegal(white, Board),
    if
	Illegal or not All_pieces ->
	    legal_count(Pieces, Cursor-1, Count);
	true ->
	    legal_count(Pieces, Cursor-1, 1+Count)
    end.
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: Further compression

Post by koistinen »

I have written some C++ code for computing illegality/invalidity status of positions.
Using one core it does this in approximately 15min for a 7-man endgame. (simulated by 640*KRRRvK)
There are many opportunities for optimization left so it should be possible to do it quickly enough that it can be useful for compression during computation of the endgame for the purpose of reducing the amount read and written to disk.
Code is at http://teo.se/F.zip, it is a work in progress, lots of room for improvement.
Post Reply