KCEC
(Kirr's Chess Engine Comparison)
A tournament of original free chess engines
June 22, 2010
Testing summary:
Total: 85,760 games
played by 148 programs
894 CPU days (X2 4600+)

White wins: 34,666 (40.4%)
Black wins: 29,495 (34.4%)
Draws: 21,599 (25.2%)
White score: 53.0%

Ponder hit

Ponder hit analysis measures similarity between the two engines by obtaining the percentage of expected and unexpected moves from their games. The idea is that similar engines can more often predict moves of each other.

Most close and most different pairs

Method

It looks simple — just count the predicted moves and divide by number of all moves. Still there are a few things to take care of. First, there are opening moves, where engines don't think. We don't count such moves in this experiment. Second, there are forced moves, where there is no other choice. Such moves should not be counted too. We detect such moves by the time spend on them, so all moves made in 0:00 seconds are not used for this analysis.

Then, there are tablebase moves and mating lines. Such lines are characterized by many forced moves, but they also have many situations where it does not matter what to play. The result is that ponder hit statistics is not so meaningful in such lines. Ponder hit statistics is much more interesting in middlegame positions, where the move choice actually shows engine playing style and understanding. To limit this experiment to middlegame only we exclude all moves made with evaluation of +−9 pawns or more.

Finally, there are boring 50-move lines where engines don't know what to do, but still trying to avoid draw. In those lines engines play shuffle chess and any ponder hit analysis is meaningless. What's worse, just on the 50-th move they will move a pawn to avoid draw, and shuffle chess continues for another 50 moves. Such cases are difficult to detect automatically, so all drawn games are ignored in this analysis, and only decided games are used.

Ponder hit cross-tables


Evaluation difference

Here you can see comparison of position evaluation reported by different engines. Each engine reports position evaluation when it makes the move. Then the opponent thinks, makes move and reports the evaluation as well. By comparing those evaluations we can see how similar is the thinking of two engines.

Most close and most different pairs

Method

It is easy to find the average evaluation difference for two engines - it is just mean of all differences in evaluation before more and after move, computed for all move in their games. For example, engine A moves e4 with evaluation of +0.15, then engine B moves c5, evaluating position as +0.08, then engine A moves Nf3, with evaluation +0.25. For this sequence of three moves the average evaluation difference can be computed as ((0.15-0.08) + (0.25-0.08))/2 = 0.12 (in pawns).

So far so good, but of course we should not just use all moves. We don't use opening moves and forced moves, and we don't use the moves where either side has evaluation more than +-9 pawns. We also don't use drawn games at all, because of 50-moves sequences. So, we are limiting this study to the same set of moves we use for ponder hit analysis.

There is one more issue to consider here. Ideally we should compare how two engines evaluate exactly same position. This is not possible when we use the game database as our input data - each engine evaluates position on its turn, after the opponent already moved. But what happens if unexpected move is played? Suppose engine A moves, and reports evaluation. A expectes a certain move from engine B, so A's evaluation is based on assumption that B will make that move. If B makes different move, different position occurs on board, not the one that A was expecting. So it seems not right to compare the evaluation of A and B in that case, because they were thinking about different lines. Because of this we use only expected moves in this experiment. This is about two times less than number of moves used for ponder hit statistics.

Evaluation difference cross-tables


Created in 2005-2010 by Kirill Kryukov
Last games added on June 22, 2010