For information, our testing conditions are as follows:
CCRL 40/40 Testing Conditions
Time control
Our time control is equivalent to 40 moves in 40 minutes on AMD X2 4600+ at 2.4GHz.
We use Crafty 19.17 BH as a benchmark to determine the equivalent time control for particular machine.
CCRL Testing Conditions
Endgame tablebases: 4 or 5 piece tablebases.
Pondering: OFF
Tournament type: Any. Match, Tournament, Gauntlet etc.
Matches or Gauntlets are usually limited to 60 games maximum per engine pair. Our standard match length in 40/40 is 30 games.
Hash size: is set to the same value of either 128 or 256 MB for all engines in a match or tourney. There are three exeptions:
1) "deep" engines running on 2 CPUs should have double hash size, compared to single engines in the same tourney. 2) "deep" engines running on 4 CPUs should have four time hash size
3) If an engine has problems with particular hash size, it may play with smaller hash.
EGTB hash: 32 MB
Tournament Interface: Any. Examples: Winboard, Arena, Shredder, Chessbase, Chess Partner.
Opening book: Any generic. Examples: remis.ctg, draw.ctg, 5moves.ctg, perfect.ctg etc. Book line length is limited to 12 moves per side maximum and book learning is off or the book set as read only. The same books are used for all engines in the match, tournament or gauntlet.
Engines with their own books have them disabled (deleted or switched off in parameters). Engines which can't disable their own book can't participate in CCRL testing.
Book learning: Off for all engines.
Position learning: Off for all engines. Alternatively learning files are set to read-only or deleted after each match.
Choosing engines to test
Engine choice is up to ndividual testers, with just a few limitations:
1. We test only stable versions of chess engines. We generally test only publicly available engines, although private versions and betas are tested solely at our discretion on a case-by-case basis.
Note: If we test a beta version, and that beta version is then released as stable version, and if the engine author says there is no playing strength difference between release and beta, then those games of the beta version are re-named.
2. We try to avoid over-testing any particular pair of engines. The reason is to avoid rating distortions and have more reliable statistics. Normally we test with 30 games for each pair, and with 60 games for interesting pairs, but anything up to 100 is fine.
Tournament format is entirely up to individual testers. We run matches, roundrobin tournaments, gauntlet, swiss or knockout events.
New or old engines can be selected and studying the current rating list will show which engines are currently being tested and which engines may need additional games. Rating differences of estimated +-400 points should be avoided.
