Tiebreaker for "best engine" on primary 40/15 list?

Questions and comments related to CCRL testing study
Post Reply
BFG
Posts: 73
Joined: Mon Jul 07, 2014 3:31 am
Sign-up code: 10159

Tiebreaker for "best engine" on primary 40/15 list?

Post by BFG »

I wasn't able to find this in a cursory search. What is used for a tiebreaker to determine "best engine" of a series for the 40/15 (and other) lists?

Currently, Stockfish 14 and 15 are tied (on the complete 40/15 list) at 3538 ELO. But 14 is listed as "best engine" for the series.
14 has +/- 18 and 15, +/- 16, ELO range. This likely is a function of 15 having played several hundred more games.
15 has a higher score (67.7% vs 66.9%), fewer draws (64.2% vs 65.0%), and LOS advantage (51.5%).
The only places, then, where 14 appears to have a superior secondary stat is on average opponent ELO (-101.1 vs 15's -105.4) and maximum ELO range.

So, I conclude that one of the three may be happening:
(1) ELO is calculated to a decimal but only displayed as integer, so they're not actually tied.
(2) 14's +18 maximum is causing it to be listed above 15's +16.
(3) Average opponent strength is used as the tiebreaker.

Regardless of which is happening, I would posit that either Score or LOS should be used as the first tiebreaker. (1) is disingenuous and too finite, (2) is modulated by the number of games played so can disadvantage engines that have played more, and (3) is a compounding factor when the tied engines have played each other. That said, I'm curious to hear arguments to the contrary.
Ray
Posts: 22611
Joined: Sun Dec 18, 2005 6:33 pm
Sign-up code: 10159
Location: NZ

Re: Tiebreaker for "best engine" on primary 40/15 list?

Post by Ray »

The ratings and the rankings are always subject to the statistical error margins, so even when engines are separated by 10,20,30 Elo you really don't know unless you play many thousands of games. Which isn't possible at this time control with limited resources.
BFG
Posts: 73
Joined: Mon Jul 07, 2014 3:31 am
Sign-up code: 10159

Re: Tiebreaker for "best engine" on primary 40/15 list?

Post by BFG »

I think I misled when I labeled this a "tiebreaker". In reality, I am interested in what determines sort order within statistically tied engines. Agreed that Stockfish 14 and 15 are statistically tied at rank 1, and it should remain that way. Also agreed that all the stats (except number of games played etc.) are subject to, I presume, a 95% confidence or prediction interval. However, Stockfish 14 is listed above 15 on the Complete list, and that list appears to control which of the two appears on the Main list. I would suggest that one of the secondary stats - probably LOS - be used to determine sort order within the rank when two engines have exactly the same ELO.

I do not think the statistical ranges (+/- 18 for Stockfish 14, +/- 16 for Stockfish 15) should be used to determine sort order, and in fact it's unclear to me how sort order IS determined here. Yes, it is possible that Stockfish 14's actual ELO is 3556, which is beyond Stockfish 15's maximum. But it's equally likely to be 3520, which is beyond Stockfish 15's minimum. That's why I would suggest using LOS to determine sort order here. You could even set up tertiary, quaternary, quintenary, and further sort order determinants if needed - though it seems pretty unlikely ELO would be tied AND LOS exactly 50%.
Post Reply