Further compression

EricLang · Post by **EricLang** » Fri Nov 07, 2008 10:20 am

Ok, thanks to your great answers in my other posts I got my tablebases working!

A short description:
-Positions with wtm and btm are stored in one file.
-duplicate squares are stored (in case of non-same pieces), but i think this is not very much
-use of unique triangle
-use of precomputed legal kingcombinations
-use of "same-piece" algorithm (p1 > p2 > p3 > p4)
-bitpacking
-datacompression in blocks of about 8192 bytes (depending on the need bits per positions)
-at the end of the file is the index for the positions of all compressed blocks, just an array of integers (which is datacompressed too)

Now for example my KQRK file is 760 KB (The two files of Nalimov are about 500 KB)
It has
- 4.620.288 positions
- 284.256 duplicates
- 720.435 invalid positions

I do not want to make the files as small as possible (being a Size Champion), but I if more compression is possible in an easy way I would like to do that.

Is there a smart way to eliminate a large part of invalid positions?
More ideas?

Thanks, Eric

koistinen · Post by **koistinen** » Fri Nov 07, 2008 1:36 pm

One way to avoid storing illegal positions is by storing positions in blocks and then filtering out illegal positions before compression.
When decompressing a block, use the same filtering function to put them back.

Would you publish your code for others to use? (under some free software licence)

EricLang · Post by **EricLang** » Fri Nov 07, 2008 1:49 pm

That seems like a good idea. I will have a try at that.

The code needs some cleanup, but I'm certainly planning to put everything online somewhere as freeware, when it's finished (I guess somewhere within a half year)

EricLang · Post by **EricLang** » Fri Nov 07, 2008 4:48 pm

Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?

EricLang · Post by **EricLang** » Fri Nov 07, 2008 9:27 pm

After all the bitpacking was not succesful. It "confuses" the bytes so normal datacompression should be enough.
My KRQK file is now 663 KB. Coming closer to Nalimov (500)
KQQK is 379 KB, Nalimovs about 320 KB
I think I need one more last space-optimization and then I will be satisfied.

Codeman · Post by **Codeman** » Fri Nov 07, 2008 10:50 pm

One more index optimization Nalimov uses is that it doesn't index unblockable checks from the stm

eg: KRK, white to move and black King on e4.
instead of having 63 left squares for the white rook, what he says is that no matter where the other pieces are standing, the rook may never be on d4,f4,e3,e5
this saves some space as well. Especially for the queen in your example.

koistinen · Post by **koistinen** » Sat Nov 08, 2008 1:04 am

EricLang wrote:Hmmm... I can't think of anything else than a sort of RunLength pre-compression for that idea of yours. Is that the way you should handle it, Koistinen?

No, not run-length, it is removal of illegal positions:

Code: Select all

int
encode(inbuf, outbuf, firstindex, count)
{
  index = firstindex
  outcount = 0

  for i=1..count
    if legal_position(index)
      *inbuf++ = *outbuf++
      outcount++
    else
      inbuf++
    index++

  return outcount
}

decode(inbuf, outbuf, firstindex, count)
{
  index = firstindex
  for i=1..count
    if legal_position(index)
      *inbuf++ = *outbuf++
    else
      *outbuf++ = ILLEGAL_POSITION
    index++
}

EricLang · Post by **EricLang** » Sat Nov 08, 2008 10:37 am

Simple and smart, thanks!
I'll have to check if the Decode procedure isn't too timeconsuming, probably not.

EricLang · Post by **EricLang** » Sun Nov 09, 2008 11:59 am

Okidoki, I got it working! My files are much smaller now than Nalimov's! I'll make a list here soon for comparison

notnale · Post by **notnale** » Sun Nov 09, 2008 1:57 pm

If I were trying to write my own generator, what would the easiest/most efficient language be?

Codeman · Post by **Codeman** » Sun Nov 09, 2008 2:48 pm

notnale wrote:If I were trying to write my own generator, what would the easiest/most efficient language be?

For EGTB generators you need to write code that compiles as fast as possible and you need good access to memory management.
I would suggest C++

EricLang · Post by **EricLang** » Sun Nov 09, 2008 4:08 pm

My generator is written with Delphi (7). No problems there

koistinen · Post by **koistinen** » Mon Aug 03, 2009 12:38 pm

Avoiding storing illegal positions looks rather promising, the more pieces, the more useful is my guess.
I'm Experimenting with Erlang (still learning) and for KRRvK I get:
[8325184,16777216,0.4962196350097656] legal for [white_rook,white_rook]
and for KRRvKR:
[508975152,1073741824,0.4740200489759445] legal for [white_rook,white_rook,
black_rook]

real 359m12.467s
user 355m20.060s
sys 0m12.925s
and for KRRRvK:
[387623040,1073741824,0.3610020875930786] legal for [white_rook,white_rook,
white_rook]

real 471m29.302s
user 469m55.926s
sys 0m0.252s

The last ought to be about the same proportion as for KRRRvKRR.
That means the bitmaps might be stored in about 64GB*0.36 is approx 23GB for endgames like KQRBvKQR.
The multiplier for just filtering adjacent kings and pieces on same square is 0.64 giving 41GB sized bitmaps.

To compute it I iterate over every square for every piece and count the legal positions with no two pieces on the same square.
I imagine computing illegal positions can be done much quicker for blocks of positions.

Code: Select all

legal_count(P)->
    Pieces = [white_king, black_king] ++ P,
    N = 1 bsl (6*lists:flatlength(Pieces)),
    Count = legal_count(Pieces, N-1, 0),
    [Count, N, Count/N].

legal_count(_Pieces, -1, Count)->Count;
legal_count(Pieces, Cursor, Count)->
    Board = retro_board(Pieces, Cursor),
    All_pieces = length(Pieces) == length(Board),
    Illegal = is_illegal(white, Board),
    if
	Illegal or not All_pieces ->
	    legal_count(Pieces, Cursor-1, Count);
	true ->
	    legal_count(Pieces, Cursor-1, 1+Count)
    end.

koistinen · Post by **koistinen** » Thu Aug 20, 2009 3:49 pm

I have written some C++ code for computing illegality/invalidity status of positions.
Using one core it does this in approximately 15min for a 7-man endgame. (simulated by 640*KRRRvK)
There are many opportunities for optimization left so it should be possible to do it quickly enough that it can be useful for compression during computation of the endgame for the purpose of reducing the amount read and written to disk.
Code is at http://teo.se/F.zip, it is a work in progress, lots of room for improvement.

CCRL Discussion Board

Further compression

Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression

Re: Further compression