bzip2 vs. emd compression of 3,4,5 tables: results

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
Archimerged
Posts: 1
Joined: Sat Dec 23, 2006 4:28 am
Sign-up code: 0

bzip2 vs. emd compression of 3,4,5 tables: results

Post by Archimerged »

I downloaded the 3, 4, and 5 piece Nalimov tablebases, checked all of the md5 checksums, and uncompressed them using a program I got from the google cache of cap.connx.com/chess-engines/crafty/tbdecode.h.

(I found it after doing a google search for DATACOMP Andrew Kadatch, and I had to add code to write the uncompressed files to disk since the test driver in that file just uncompresses the files and discards them).

Of 290 files in the tablebases, 58 compress over 1.5 times better with bzip2 while only 33 compress less well.

What is the difference between the files which compress very well with bzip2 and those that don't? Surely a compression scheme which takes the nature of the tables into account could do better. When thinking of 7 man tablebases, doing better is essential. Where can I find source code for tablebase generation and use?

Code: Select all

 uncomp: 30,612,186,969 bytes
 emd:     7,580,348,624 bytes = 24.76% of uncomp
 bz2:     6,750,613,883 bytes = 22.05% of uncomp
 best:    6,696,098,654 bytes = 21.87% of uncomp
Overall, bzip2 saves 2.7% of the uncompressed size.
Using bzip2 only when better than emd saves 2.9%.
      file     uncomp        emd        bz2      emd/u      bz2/u    emd/bz2
  krkp.nbw    5072736    1208181    1431503    0.23817    0.28220    0.84399
  krkn.nbw    1649196     324996     384846    0.19706    0.23335    0.84448
  krkb.nbw    1649196     216606     249344    0.13134    0.15119    0.86870
  kqkr.nbb    1649196     641119     726889    0.38875    0.44075    0.88200
 krpkp.nbb  231758952   72855656   79039209    0.31436    0.34104    0.92177
 kqrkq.nbw   90038460   21727039   23407721    0.24131    0.25997    0.92820
 kqpkq.nbw  272015040   79799668   85252431    0.29336    0.31341    0.93604
 kqpkr.nbw  272015040   78626001   83783582    0.28905    0.30801    0.93844
 krpkn.nbb  295914240  111050373  117813411    0.37528    0.39813    0.94260
  krkr.nbb    1649196     189818     201332    0.11510    0.12208    0.94281
  krkr.nbw    1649196     189818     201332    0.11510    0.12208    0.94281
 krbkp.nbb  298890240   75650198   79835507    0.25310    0.26711    0.94758
 knpkq.nbw  278914860  118407339  124596297    0.42453    0.44672    0.95033
 krbkq.nbw   95673600   25786838   27089277    0.26953    0.28314    0.95192
 krppk.nbw  105758666   16009191   16798381    0.15137    0.15884    0.95302
  kqkr.nbw    1563735     562344     588224    0.35962    0.37617    0.95600
 krpkq.nbw  286777440  129050046  134984734    0.45000    0.47070    0.95603
  krbk.nbw    1594560     278694     291147    0.17478    0.18259    0.95723
 kqnkq.nbw   87576960   16380152   16918492    0.18704    0.19318    0.96818
 kbpkp.nbb  231758952   73646877   75908764    0.31777    0.32753    0.97020
 krnkp.nbb  298890240   80747000   83120629    0.27016    0.27810    0.97144
   krk.nbb      28644       7051       7240    0.24616    0.25276    0.97390
 kppkr.nbw  108400260   37539950   38514648    0.34631    0.35530    0.97469
 kqbkq.nbw   90750420   17033442   17460090    0.18770    0.19240    0.97556
 kbnkq.nbw   93037200   47794729   48874871    0.51372    0.52533    0.97790
 kqnkr.nbw   87576960   25097634   25637768    0.28658    0.29275    0.97893
 krnkq.nbw   92308740   31478179   32139055    0.34101    0.34817    0.97944
 kbppk.nbw  106602156   24338976   24811602    0.22832    0.23275    0.98095
 krbkn.nbb   96192120   33647571   34258926    0.34980    0.35615    0.98215
 kqrkr.nbw   90038460   14061624   14296148    0.15617    0.15878    0.98360
 krbkb.nbw   95673600   30559120   30739841    0.31941    0.32130    0.99412
 kppkq.nbw  108400260   34333714   34391932    0.31673    0.31727    0.99831
  kqkn.nbw    1563735     413756     413579    0.26459    0.26448    1.00043
 knpkp.nbb  231758952   78684979   78434435    0.33951    0.33843    1.00319
 krpkb.nbb  306720000  104586860  103912815    0.34098    0.33879    1.00649
 knpkn.nbb  295914240   47309160   46928655    0.15987    0.15859    1.00811
 kbnkp.nbw  290989584  124234784  122963366    0.42694    0.42257    1.01034
 kbpkq.nbw  289027680  113061113  111784979    0.39118    0.38676    1.01142
 krnkb.nbw   92308740   34210147   33770549    0.37061    0.36584    1.01302
 kqbkr.nbw   90750420   23267363   22920419    0.25639    0.25257    1.01514
 krrkq.nbw   46658340   10262274   10102634    0.21995    0.21652    1.01580
 krpkp.nbw  226121876   56050698   55145946    0.24788    0.24388    1.01641
 krnkp.nbw  288692928   75495896   74270734    0.26151    0.25727    1.01650
 kppkp.nbw   84219361   28762929   28268282    0.34152    0.33565    1.01750
 krpkr.nbw  286777440   81247218   79627712    0.28331    0.27766    1.02034
 kqpkb.nbw  272015040   60321420   59073333    0.22176    0.21717    1.02113
 krrkr.nbw   46658340   11267500   11034262    0.24149    0.23649    1.02114
 kqrkb.nbw   90038460   12636089   12367703    0.14034    0.13736    1.02170
  kqkb.nbw    1563735     401622     392865    0.25684    0.25124    1.02229
 krbkn.nbw   95673600   31221151   30519347    0.32633    0.31899    1.02300
  krkp.nbb    4981504    1479273    1445256    0.29695    0.29012    1.02354
 kqpkn.nbw  272015040   58735454   57313080    0.21593    0.21070    1.02482
 krbkp.nbw  299203200   64224273   62401478    0.21465    0.20856    1.02921
 krrkb.nbw   46658340   11435408   11099855    0.24509    0.23790    1.03023
 kppkb.nbw  108400260   27759341   26932085    0.25608    0.24845    1.03072
 krnkn.nbw   92308740   33798053   32745376    0.36614    0.35474    1.03215
   kqk.nbb      28644       5961       5773    0.20811    0.20154    1.03257
 kbnnk.nbw   43406294   13843718   13373873    0.31893    0.30811    1.03513
  kpkp.nbw    3863492    1116299    1074134    0.28894    0.27802    1.03925
 kqrkn.nbw   90038460   11004504   10573036    0.12222    0.11743    1.04081
 kbnkp.nbb  298890240  117875922  113013072    0.39438    0.37811    1.04303
 knnkp.nbw  137991648   42962550   41103948    0.31134    0.29787    1.04522
   kpk.nbw      81664      17654      16869    0.21618    0.20657    1.04654
 kqpkp.nbb  231758952   58816493   56112103    0.25378    0.24211    1.04820
 krpkb.nbw  286777440  115490988  110178849    0.40272    0.38420    1.04821
 knpkp.nbw  219921779   85955930   81982799    0.39085    0.37278    1.04846
 kbpkn.nbb  295914240   55592010   53007870    0.18787    0.17913    1.04875
  kpkp.nbb    3863492    1111988    1059925    0.28782    0.27434    1.04912
  krbk.nbb    1747284     357372     340465    0.20453    0.19485    1.04966
 kqpkp.nbw  214481388   35125215   33351598    0.16377    0.15550    1.05318
 krnkn.nbb   96192120   32899217   31228033    0.34202    0.32464    1.05352
 kqppk.nbw  100347220   14354995   13618422    0.14305    0.13571    1.05409
 kbnkn.nbw   93037200   21542326   20352825    0.23155    0.21876    1.05844
 kppkn.nbw  108400260   33306012   31446424    0.30725    0.29010    1.05914
 kqnkb.nbw   87576960   19689196   18574599    0.22482    0.21209    1.06001
  kqkp.nbb    4981504    1353920    1272562    0.27179    0.25546    1.06393
  kqkq.nbb    1563735     257632     241730    0.16475    0.15459    1.06578
  kqkq.nbw    1563735     257632     241730    0.16475    0.15459    1.06578
 kqrkr.nbb   98951760   28475167   26675089    0.28777    0.26958    1.06748
 krpkn.nbw  286777440  109771304  102828339    0.38278    0.35856    1.06752
 krbpk.nbb  307483920   63711657   59670553    0.20720    0.19406    1.06772
 knpkr.nbw  278914860   61661742   57686560    0.22108    0.20682    1.06891
 kbbkq.nbw   47393100   20108086   18777821    0.42428    0.39621    1.07084
 kqpkn.nbb  295914240   85568988   79727151    0.28917    0.26943    1.07327
 kqnkn.nbw   87576960   19520374   18181310    0.22289    0.20760    1.07365
 kqbkb.nbw   90750420   17575640   16346035    0.19367    0.18012    1.07522
 kqrkn.nbb   96192120   26897277   24987225    0.27962    0.25976    1.07644
 krbkb.nbb   99709380   31240204   28874101    0.31331    0.28958    1.08195
 kqbkn.nbb   96192120   27201430   25099313    0.28278    0.26093    1.08375
 knpkb.nbb  306720000   28720700   26496807    0.09364    0.08639    1.08393
  krnk.nbb    1747284     367838     339236    0.21052    0.19415    1.08431
 kqbkr.nbb   98951760   29120713   26847819    0.29429    0.27132    1.08466
 krnkb.nbb   99709380   30043333   27660563    0.30131    0.27741    1.08614
  kqkn.nbb    1603202     582758     535498    0.36350    0.33402    1.08825
 krrkn.nbw   46658340   10153345    9327054    0.21761    0.19990    1.08859
 knnkq.nbw   44118240   20041961   18399172    0.45428    0.41704    1.08929
  krpk.nbb    5124732    1167451    1071495    0.22781    0.20908    1.08955
 kbbkn.nbw   47393100   13167012   12032523    0.27783    0.25389    1.09429
  kbnk.nbb    1747284     479960     438192    0.27469    0.25078    1.09532
 kbnpk.nbb  307483920   82243808   75081273    0.26747    0.24418    1.09540
 kbnkq.nbb   93824100   37093045   33828755    0.39535    0.36056    1.09649
  kqkb.nbb    1661823     505366     459761    0.30410    0.27666    1.09919
  kqpk.nbb    5124732     913558     831055    0.17826    0.16217    1.09928
  krnk.nbw    1538479     308291     280272    0.20039    0.18217    1.09997
 kqbkn.nbw   90750420   18096496   16442092    0.19941    0.18118    1.10062
 krnpk.nbb  307483920   61164088   55505228    0.19892    0.18051    1.10195
 kqbkp.nbb  298890240   73030292   66128947    0.24434    0.22125    1.10436
 kbnkr.nbw   93037200   16928642   15327113    0.18196    0.16474    1.10449
 kqnkp.nbw  273904512   49029418   44200480    0.17900    0.16137    1.10925
 kqqkq.nbw   41944320    8259179    7392385    0.19691    0.17624    1.11725
 kqrkb.nbb   99709380   26994427   24152775    0.27073    0.24223    1.11765
 kqpkr.nbb  304369920  107266797   95843095    0.35242    0.31489    1.11919
  kqkp.nbw    4810080    1103045     985543    0.22932    0.20489    1.11923
 kbpkr.nbw  289027680   63223899   56376970    0.21875    0.19506    1.12145
 krbkr.nbw   95673600   17189887   15318326    0.17967    0.16011    1.12218
 kbpkq.nbb  288610560   77733361   69226833    0.26934    0.23986    1.12288
 kqpkb.nbb  306720000   82117476   73055109    0.26773    0.23818    1.12405
  knkp.nbw    4931904     618587     549954    0.12543    0.11151    1.12480
 kbpkb.nbb  306720000   31010452   27531134    0.10110    0.08976    1.12638
 kqrkp.nbw  281568240   25761384   22838240    0.09149    0.08111    1.12799
 kqrbk.nbw   88557959    7891933    6995896    0.08912    0.07900    1.12808
 kqbkp.nbw  283818240   43326441   38363510    0.15266    0.13517    1.12937
 krnnk.nbw   43056198    8129370    7179438    0.18881    0.16675    1.13231
 kbnkb.nbw   93037200   16563176   14599222    0.17803    0.15692    1.13452
 krpkr.nbb  304369920   68946403   60627781    0.22652    0.19919    1.13721
 kbpkn.nbw  289027680   94442758   83032448    0.32676    0.28728    1.13742
 kppkr.nbb  119209296   36818260   32360899    0.30885    0.27146    1.13774
 kqnkp.nbb  298890240   73087786   64212584    0.24453    0.21484    1.13822
 knpkn.nbw  278914860   85719133   75286500    0.30733    0.26993    1.13857
 kqnkn.nbb   96192120   25311422   22216283    0.26313    0.23096    1.13932
 kqnkr.nbb   98951760   29336593   25680846    0.29647    0.25953    1.14235
 knpkb.nbw  278914860   71334136   62356347    0.25576    0.22357    1.14398
  kbkp.nbw    5112000     369787     322267    0.07234    0.06304    1.14746
 krpkq.nbb  288610560  105984844   92278948    0.36722    0.31974    1.14853
 krbnk.nbb  104837040   24627236   21411299    0.23491    0.20423    1.15020
 kqbkb.nbb   99709380   25553026   22211345    0.25628    0.22276    1.15045
 kqrkp.nbb  298890240   67590215   58714185    0.22614    0.19644    1.15117
 knpkq.nbb  288610560   81700305   70604859    0.28308    0.24464    1.15715
  kqbk.nbb    1747284     329501     284563    0.18858    0.16286    1.15792
 krbkq.nbb   93824100   35515289   30669128    0.37853    0.32688    1.15801
 kbpkp.nbw  227896016   81204398   69330055    0.35632    0.30422    1.17127
 krnkq.nbb   93824100   36162524   30743527    0.38543    0.32767    1.17626
 krnkr.nbw   92308740   16970339   14409712    0.18384    0.15610    1.17770
  kqrk.nbb    1747284     296182     251223    0.16951    0.14378    1.17896
 kqrbk.nbb  104837040   15608322   13196641    0.14888    0.12588    1.18275
   krk.nbw      27030       7059       5959    0.26115    0.22046    1.18459
 krrkp.nbw  145901232   21762225   18268474    0.14916    0.12521    1.19124
 kqnkb.nbb   99709380   23937895   19999778    0.24008    0.20058    1.19691
 kppkp.nbb   89391280   34118774   28469456    0.38168    0.31848    1.19843
  knpk.nbw    4648581    1547889    1286669    0.33298    0.27679    1.20302
  kqnk.nbb    1747284     302140     250774    0.17292    0.14352    1.20483
 kbbkp.nbw  148223520   31876185   26344185    0.21505    0.17773    1.20999
   kpk.nbb      84012      16589      13692    0.19746    0.16298    1.21158
 kbpkb.nbw  289027680   72384007   59704073    0.25044    0.20657    1.21238
 kbbkq.nbb   46912050   19728310   16250102    0.42054    0.34640    1.21404
 kqbpk.nbb  307483920   62632517   51363375    0.20369    0.16704    1.21940
  kbnk.nbw    1550620     555601     454373    0.35831    0.29303    1.22279
 kqqkr.nbb   49475880   20270606   16476435    0.40971    0.33302    1.23028
 kqbnk.nbb  104837040   20822831   16923590    0.19862    0.16143    1.23040
 knnkq.nbb   46912050   20875024   16947702    0.44498    0.36127    1.23173
 knppk.nbw  102898651   23979485   19445192    0.23304    0.18897    1.23318
 kqrkq.nbb   93824100   37812136   30656677    0.40301    0.32675    1.23341
  kbpk.nbb    5124732    1290857    1044616    0.25189    0.20384    1.23572
 knnpk.nbw  130135501   41955433   33793509    0.32240    0.25968    1.24152
 kbbnk.nbb   52418520   18504646   14804247    0.35302    0.28242    1.24996
 krrkn.nbb   48096060   21039767   16794698    0.43745    0.34919    1.25276
  kqbk.nbw    1512507     273890     218099    0.18108    0.14420    1.25581
  kbpk.nbw    4817128    1511181    1190268    0.31371    0.24709    1.26961
 kqrnk.nbb  104837040   14366109   11290052    0.13703    0.10769    1.27246
 kqnpk.nbw  258294639   34825874   27356613    0.13483    0.10591    1.27303
 kbnpk.nbw  274352939   81372124   63898641    0.29660    0.23291    1.27346
  krkn.nbb    1603202     179806     140366    0.11215    0.08755    1.28098
 kqnpk.nbb  307483920   55981684   43652681    0.18206    0.14197    1.28243
 kbbkn.nbb   48096060   17795185   13830444    0.36999    0.28756    1.28667
  krrk.nbb     873642     201764     156782    0.23095    0.17946    1.28691
 kppkn.nbb  115899744   28882490   22426797    0.24920    0.19350    1.28786
 kbbkr.nbw   47393100    5392436    4166925    0.11378    0.08792    1.29410
 kqqkr.nbw   41944320    5735822    4424616    0.13675    0.10549    1.29634
 kbppk.nbb  114742320   31774906   24426890    0.27692    0.21288    1.30082
 krnpk.nbw  272153675   53332329   40824849    0.19596    0.15001    1.30637
 kqrpk.nbw  265421907   23193001   17687957    0.08738    0.06664    1.31123
 kqrnk.nbw   85470603    8105203    6176854    0.09483    0.07227    1.31219
 krbbk.nbw   46242089    7771724    5922093    0.16807    0.12807    1.31233
 krrkq.nbb   46912050   19828409   15054091    0.42267    0.32090    1.31714
 kqqkb.nbw   41944320    4869794    3696862    0.11610    0.08814    1.31728
 kqpkq.nbb  288610560   65106951   49374702    0.22559    0.17108    1.31863
  knpk.nbb    5124732    1227423     928549    0.23951    0.18119    1.32187
 kbbpk.nbb  153741960   52235062   39260998    0.33976    0.25537    1.33046
 krrkp.nbb  149445120   47021394   35338611    0.31464    0.23647    1.33060
 kbnkn.nbb   96192120    3897100    2925990    0.04051    0.03042    1.33189
 kqrpk.nbb  307483920   50349142   37740104    0.16375    0.12274    1.33410
 kqbpk.nbw  267576632   36547485   27345055    0.13659    0.10220    1.33653
  kppk.nbw    1806671     384915     286847    0.21305    0.15877    1.34188
 kbnnk.nbb   52418520   20844682   15521506    0.39766    0.29611    1.34295
  knkp.nbb    4981504    1089865     810863    0.21878    0.16277    1.34408
 knpkr.nbb  304369920   54269849   39988495    0.17830    0.13138    1.35714
 krrkb.nbb   49854690   21390664   15755255    0.42906    0.31602    1.35768
 knnnk.nbw   13486227    5127367    3771842    0.38019    0.27968    1.35938
 kqqkn.nbb   48096060   17244425   12677250    0.35854    0.26358    1.36027
 krrbk.nbw   45873720    5228371    3821859    0.11397    0.08331    1.36802
  kqqk.nbb     873642     181425     132511    0.20767    0.15168    1.36913
 krbkr.nbb   98951760    4338256    3166036    0.04384    0.03200    1.37025
 krnnk.nbb   52418520   14240132   10389139    0.27166    0.19820    1.37067
 krrkr.nbb   49475880   17169632   12498320    0.34703    0.25261    1.37376
 knnkp.nbb  149445120   43337084   31440093    0.28999    0.21038    1.37840
 knnpk.nbb  153741960   45938083   33262588    0.29880    0.21635    1.38107
 kqqkn.nbw   41944320    4378753    3168712    0.10439    0.07555    1.38187
 krbpk.nbw  281991360   53020366   38335670    0.18802    0.13595    1.38306
   kqk.nbw      25629       7605       5496    0.29673    0.21444    1.38373
 kqnnk.nbw   40873646    8083355    5841718    0.19776    0.14292    1.38373
  krpk.nbw    4779530    1197199     863791    0.25048    0.18073    1.38598
 knppk.nbb  114742320   29929581   21590873    0.26084    0.18817    1.38621
  kbkp.nbb    4981504     775143     555973    0.15560    0.11161    1.39421
 krbbk.nbb   52418520   15666016   11235788    0.29886    0.21435    1.39430
 kbbkb.nbw   47393100    5176068    3699563    0.10922    0.07806    1.39910
 krbnk.nbw   90787358   19084104   13610617    0.21021    0.14992    1.40215
 kqbnk.nbw   86166717   13299001    9453123    0.15434    0.10971    1.40684
 krrpk.nbw  137491197   15169175   10723356    0.11033    0.07799    1.41459
 kqqkq.nbb   46912050   18641297   13171580    0.39737    0.28077    1.41527
 kqqkp.nbw  131170128   11376740    8005676    0.08673    0.06103    1.42108
 kqbbk.nbw   43879679    7050500    4947177    0.16068    0.11274    1.42516
 kbpkr.nbb  304369920   44405740   31156913    0.14589    0.10237    1.42523
 kbbkp.nbb  149445120   51409885   35802265    0.34401    0.23957    1.43594
  kppk.nbb    1912372     610687     425203    0.31933    0.22234    1.43622
 kbbpk.nbw  139715040   39266109   27326042    0.28104    0.19558    1.43695
 krrbk.nbb   52418520   10501479    7131401    0.20034    0.13605    1.47257
 kqqkp.nbb  149445120   38155021   25880145    0.25531    0.17317    1.47430
 krrpk.nbb  153741960   31457218   21301805    0.20461    0.13856    1.47674
 kbbnk.nbw   44983618   14067396    9485137    0.31272    0.21086    1.48310
 kppkq.nbb  113036880   31779825   21410112    0.28115    0.18941    1.48434
 krppk.nbb  114742320   29506318   19856181    0.25715    0.17305    1.48600
 kqqbk.nbw   41270973    3721405    2499269    0.09017    0.06056    1.48900
 krrnk.nbw   44265261    4899253    3274523    0.11068    0.07398    1.49617
 kqbkq.nbb   93824100   16557260   10919479    0.17647    0.11638    1.51630
  kqnk.nbw    1459616     284437     186951    0.19487    0.12808    1.52145
 knnkr.nbb   49475880    3411881    2232996    0.06896    0.04513    1.52794
 kqqkb.nbb   49854690   20033926   13080502    0.40185    0.26237    1.53159
 kqbbk.nbb   52418520   13701952    8933934    0.26140    0.17043    1.53370
 krnkr.nbb   98951760    5584441    3640178    0.05644    0.03679    1.53411
 kqnnk.nbb   52418520   12198851    7902043    0.23272    0.15075    1.54376
  kbbk.nbb     873642     205549     132238    0.23528    0.15136    1.55439
 krrnk.nbb   52418520    9738576    6238234    0.18579    0.11901    1.56111
 knnnk.nbb   17472840    6831165    4351367    0.39096    0.24904    1.56989
  kbbk.nbw     789885     249216     158701    0.31551    0.20092    1.57035
 kppkb.nbb  120132000   21826306   13862726    0.18169    0.11540    1.57446
 kbnkr.nbb   98951760    5471709    3404520    0.05530    0.03441    1.60719
 kqppk.nbb  114742320   24950559   15172322    0.21745    0.13223    1.64448
 kpppk.nbw   26061704    4453919    2686692    0.17090    0.10309    1.65777
 kqnkq.nbb   93824100   16600758    9740703    0.17693    0.10382    1.70427
 kqqpk.nbw  123688859   11214503    6470610    0.09067    0.05231    1.73314
 kqqrk.nbw   40916820    2904111    1667307    0.07098    0.04075    1.74180
 kqqpk.nbb  153741960   29021928   16562188    0.18877    0.10773    1.75230
 knnkr.nbw   44118240     329537     186125    0.00747    0.00422    1.77051
 kqrrk.nbw   43157690    4453279    2487530    0.10319    0.05764    1.79024
 kqqbk.nbb   52418520    9212863    5122975    0.17576    0.09773    1.79834
 kqqnk.nbw   39840787    3884319    2156670    0.09750    0.05413    1.80107
  kqpk.nbw    4533490     886731     491905    0.19560    0.10850    1.80265
 kpppk.nbb   28388716    9432986    5216593    0.33228    0.18376    1.80827
  kqrk.nbw    1500276     211121     116067    0.14072    0.07736    1.81896
   kbk.nbw      27243       1503        826    0.05517    0.03032    1.81961
 kqqrk.nbb   52418520    6969265    3804944    0.13295    0.07259    1.83163
 kbnkb.nbb   99709380    1402947     764569    0.01407    0.00767    1.83495
  krrk.nbw     777300     135331      72701    0.17410    0.09353    1.86147
  krkb.nbb    1661823      84137      44964    0.05063    0.02706    1.87121
 kbbbk.nbw   15010230    4599013    2338972    0.30639    0.15583    1.96625
 kqrrk.nbb   52418520    9389763    4683110    0.17913    0.08934    2.00503
 kqqnk.nbb   52418520    8184819    4070928    0.15614    0.07766    2.01055
 kbbbk.nbb   17472840    6450288    3043630    0.36916    0.17419    2.11927
  kqqk.nbw     698739     141855      66904    0.20302    0.09575    2.12028
 kbbkr.nbb   49475880    2376524    1101716    0.04803    0.02227    2.15711
 kbbkb.nbb   49854690    1236129     461789    0.02479    0.00926    2.67683
 kqqqk.nbw   12479974    1620714     511645    0.12987    0.04100    3.16765
 krrrk.nbw   14644690    1942079     611407    0.13261    0.04175    3.17641
 krrrk.nbb   17472840    4665085    1253779    0.26699    0.07176    3.72082
 kqqqk.nbb   17472840    4302377    1095380    0.24623    0.06269    3.92775
   kbk.nbb      28644        187         46    0.00653    0.00161    4.06522
   knk.nbb      28644        187         46    0.00653    0.00161    4.06522
   knk.nbw      26282        186         44    0.00708    0.00167    4.22727
  kbkb.nbb    1661823      13989       2823    0.00842    0.00170    4.95537
  kbkb.nbw    1661823      13989       2823    0.00842    0.00170    4.95537
  kbkn.nbw    1661823      14063       2741    0.00846    0.00165    5.13061
  knnk.nbw     735304       1358        192    0.00185    0.00026    7.07292
 knnkn.nbw   44118240      89336       8564    0.00202    0.00019   10.43157
  knnk.nbb     873642       1441         98    0.00165    0.00011   14.70408
 knnkb.nbw   44118240      70028       4015    0.00159    0.00009   17.44159
 knnkb.nbb   49854690     256831      14697    0.00515    0.00029   17.47506
 knnkn.nbb   48096060      75659       3284    0.00157    0.00007   23.03867
  kbkn.nbb    1603202       2451         85    0.00153    0.00005   28.83529
  knkn.nbb    1603202       2449         84    0.00153    0.00005   29.15476
  knkn.nbw    1603202       2449         84    0.00153    0.00005   29.15476
      file     uncomp        emd        bz2      emd/u      bz2/u    emd/bz2
Arpad Rusz
Posts: 93
Joined: Mon Mar 27, 2006 5:33 pm
Sign-up code: 0
Location: Romania/Hungary
Contact:

Post by Arpad Rusz »

Welcome to our forum!
I am sharing the source code for the 6 man TB generation on eMule (tbgen.zip). But please read first an old post in this forum with the title "6 Man TB Generator".
There is a compiled version too (RunTbGen.rar). But it is much simpler to download the tablebases from eMule than to generate them.
guyhaw
Posts: 489
Joined: Sat Jan 21, 2006 10:43 am
Sign-up code: 10159
Location: Reading, UK
Contact:

emd compression efficacy

Post by guyhaw »

The compression achieved by emd is a function of the blocksize: the greater the blocksize, the better the compression ratio but the more has to be fetched per random access.
Bob Hyatt did extensive comparisons of EGT-access-efficiency for various blocksizses and recommended the current size - 8KB (?).
So, if you are comparing compression techniques, it's best to compare them for the same blocksize.
g
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

Based on what I remember from generating the 5-man set, and compressing them myself (This was at least 5-6 years ago) the standard compression used by datacomp as 8192 block size. However the 3-4-5 piece files were actually 10000 block size I found out. I am not sure if both were compatible, but to achieve the same filesize and to match up the MD5's I had to use the 10000 block size.

I would assume that the 6-man files are compressed using the same, but I don't know 100%. Probably of little interest, but one thing I did know off hand.

Derek
jkominek
Posts: 150
Joined: Mon Dec 04, 2006 9:02 am
Sign-up code: 0
Location: Pittsburgh, PA

Post by jkominek »

Clocks - no, I think your memory is mistaken. The files are compressed with an 8 kB block size, whereas the program default is 64 kB. There is another possible cause of discrepancy and this is more subtle: it's that the order of files given to datacomp makes a difference. If more than one file is presented on the command line then the first one only is used to build the "statistical model" which is applied to all files listed. Nalimov compressed each file separately, meaning that the compression is tuned to each tablebase individually. This normally gives better results.

By the way, bigger blocks compress better, up to a point. To give one example, when using 64 kB blocks the 5-man tablebase krrrk is 83.5% of the official version. One day I'll perform a complete comparison. Maybe we'd save 100-150 MB (though all existing software would break).

Archimerged - Sadly, your numbers are meaningless. That's because bzip2 compreses and decompresses in whole. While it is true that both datacomp and bzip2 are block coders -- and bzip2 places CRC checks within and syncronization marks between blocks -- bzip2 does not support random access. That's the critical missing feature. When you type a FEN into Wilhelm (or whatever), the string gets converted to a position index. With the uncompresses tablebases this index can be used to directly read the DTM result. When working with the compressed tablebases the position index is converted to a block index, the 8k block is read, decompressed in memory, and from there the result is found. With the bzip2 format you'd have to read the entire file from the beginning until you found what you wanted -- and that clearly won't do.

However, your larger point is well taken. Namely, that we can do better. For the forthcoming 7-man tablebases I believe the issue of compression should be revisited. This includes the block size (and consequence size of the index table), as well as the algorithm itself. Plus, the datacomp code is a wreck! Completely unmaintainable. :(

john
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

Well, ran a little test here. I downloaded kbnkq.nbw.emd. Uncompressed it with datacomp. Took the uncompressed filed into 2 directories.

Then recompressed it, one with 10000 block size, and one with 8192 block size.

Both ones reproduce the EXACT same file, with the MD5 matching right up on both.

Tried with another file - same results.

As a further test, I compressed kbnkq.nbw w/ 32768 block size. However, the MD5 did not match up.

So I'm not wrong that I compressed it with 10000 block size, however it seems like I am wrong that 8192 block size didn't line up when it in fact did.

Anyone care to explain why 10000 is giving me the same file as 8192? I could do this with 10-15 files and see the results, but I'm bored enough after two.

Here's exactly what I did:

C:\tb>datacomp
datacomp -- multidimentional data compression utility
copyright (c) 1991--1998 by Andrew Kadatach
usage: datacomp command file1 file2 file3 ...
commands:
e[:NNN] -- encode files using block of NNN bytes (default is 65536)
(all files will be encoded using first file statistics)
suffix ".emd" will be added to each name of file with encoded data
d -- decode original files out of ".emd"-files
last suffix will be removed from each name
t -- test integrity of compressed ".emd"-files
v -- test integrity of compressed ".emd"-files and decompression speed

C:\tb>datacomp d kbnkq.nbw.emd
kbnkq.nbw.emd (93037200 bytes, 1.516 sec, 61370185 bytes/sec, CRC32 checking is
ON)

C:\tb>cd 8192

C:\tb\8192>datacomp e:8192 kbnkq.nbw
analyzing "kbnkq.nbw" pass 1: 93037200 bytes parsed
analyzing "kbnkq.nbw" pass 2: 93037200 bytes parsed
-120, -60, -420, -7, -240, -360, -300, -1,
-480, -8, -9, -2, -540, -180, -6, -15,
-14, -4, -5, -16, -3, -13, -11, -10
pass 1: 93037200 bytes parsed
pass 2: 93037200 bytes read, 47794729 bytes written (1.947 : 1)

C:\tb\8192>cd..

C:\tb>cd 10000

C:\tb\10000>datacomp e:10000 kbnkq.nbw
analyzing "kbnkq.nbw" pass 1: 93037200 bytes parsed
analyzing "kbnkq.nbw" pass 2: 93037200 bytes parsed
-120, -60, -420, -7, -240, -360, -300, -1,
-480, -8, -9, -2, -540, -180, -6, -15,
-14, -4, -5, -16, -3, -13, -11, -10
pass 1: 93037200 bytes parsed
pass 2: 93037200 bytes read, 47794729 bytes written (1.947 : 1)

C:\tb\10000>md5sum kbnkq.nbw.emd
12d9988b0cb7037db538606e2a2a713f *kbnkq.nbw.emd

C:\tb\10000>md5sum c:\tb\kbnkq.nbw.emd
12d9988b0cb7037db538606e2a2a713f *kbnkq.nbw.emd

C:\tb\10000>md5sum c:\tb\8192\kbnkq.nbw.emd
12d9988b0cb7037db538606e2a2a713f *kbnkq.nbw.emd
What do you think? :) - Not really mistaken other entirely, only half, so I'll half-argue it with you. :)
jkominek
Posts: 150
Joined: Mon Dec 04, 2006 9:02 am
Sign-up code: 0
Location: Pittsburgh, PA

Post by jkominek »

Hi Derek,
Anyone care to explain why 10000 is giving me the same file as 8192?
Yes, I know exactly why. datacomp only works with sizes of blocks that are a power of 2, within the range of 256 bytes to 64 kB. So when you enter 10000, the programmer had two choices: either complain and exit, or round to the nearest power of 2. Andrew Kadatch took the second option.

Now I bet it makes sense. The nearest power of 2 to 10000 is 8192, and by convention all Nalimov tablebases are compressed with this block size.

john
clocks
Posts: 102
Joined: Thu Nov 23, 2006 9:27 am
Sign-up code: 0

Post by clocks »

Makes sense entirely. :)

I was in 10th grade at the time, or maybe 9th. Must have read into it all wrong :)
ath
Posts: 11
Joined: Sat Sep 15, 2007 6:56 am

Re:

Post by ath »

jkominek wrote:For the forthcoming 7-man tablebases I believe the issue of compression should be revisited. This includes the block size (and consequence size of the index table), as well as the algorithm itself.
A factor here is the uncompressed files: if they don't contain redundancies in a form that can be exploited by the coding engine, it may be an idea to ensure they do. The Nalimov tables contain a certain 'compression' already in uncompressed state, as a number of illegal positions simply aren't represented. This means that fixed blocks probably won't 'fit' the data well, and so not compress as well as could be done.

Some of the Edwards databases could be compressed extremely well, because they contained 64-byte sized structures that were repeated and so could be encoded. Tables like KNNNK (I think it was) could be compressed almost into nothing (admittedly, mainly because it's a fairly simple endgame). It's around 6MB in Nalimov version, but compressing the Edwards table ended up around 63kB.

/ATh
User avatar
jshriver
Posts: 298
Joined: Tue Jan 31, 2006 5:59 am
Sign-up code: 0
Location: Toledo, OH, USA
Contact:

Re: bzip2 vs. emd compression of 3,4,5 tables: results

Post by jshriver »

I brought up this argument a couple years back. Dr. Hyatt pretty much set me straight when he said (more or less) the compression algorithm used in the nalimov dataset was specifically designed for egtb's so whether it compresses better or not than bzip or gzip isn't as important, those methods don't offer the same capability as datacomp. Mostly because of the ability to quickly search the index and search the data at random points isn't possible with bzip/gzip.
Post Reply