I have 8 Cores I used 7 concurrency, I always leave 1 core for the system. And it safe to say that it is good.jkominek wrote: ↑Wed Dec 21, 2022 3:35 pm I have a question in regards to relating the benchmark calibration to running conditions. The answer is perhaps obvious, but better to ask since I have not seen it addressed in this forum thread.
i) How many simultaneous games do CCRL testers typically run?
ii) If it is more than one, is benchmark calibration performed when the machine running a CPU load in line with what is expected during the tournament?
In Kirill's post he suggests rebooting to get a clean slate before benchmarking. But that was back in the years of single-CPU computers.
To be specific to my situation I have a 48-CPU test computer, and use cutechess-cli. I set "-concurrency 48" when running engines single-threaded, or "-concurrency 12" with engines configured to 4 threads. The benchmark result under heavy load is about double the time when run while the machine is idle. That makes a big difference to time control settings if I want a good match CCRL testing conditions.
CCRL 40/15 Testing Conditions (previously 40/40)
-
- Posts: 2191
- Joined: Thu Aug 05, 2021 2:35 pm
- Sign-up code: 10159
- Location: Cavite, Philippines
- Contact:
Re: CCRL 40/15 Testing Conditions (previously 40/40)
CCRL Testing Group
Re: CCRL 40/15 Testing Conditions (previously 40/40)
Very good point. Clock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ishjkominek wrote: ↑Wed Dec 21, 2022 3:35 pm
i) How many simultaneous games do CCRL testers typically run?
ii) If it is more than one, is benchmark calibration performed when the machine running a CPU load in line with what is expected during the tournament?
In Kirill's post he suggests rebooting to get a clean slate before benchmarking. But that was back in the years of single-CPU computers.
To be specific to my situation I have a 48-CPU test computer, and use cutechess-cli. I set "-concurrency 48" when running engines single-threaded, or "-concurrency 12" with engines configured to 4 threads. The benchmark result under heavy load is about double the time when run while the machine is idle. That makes a big difference to time control settings if I want a good match CCRL testing conditions.
If I'm going to be running concurrency 10, what I do is start a process which takes 9 threads, and then I run the Stockfish bench and use the figure returned under those conditions to calculate the adjusted time control.
I also do not exceed the core count. The 19 10900 is 10 cores, 20 threads. I never run concurrency of more than 10.
- Gabor Szots
- Posts: 13193
- Joined: Sat Dec 09, 2006 6:30 am
- Sign-up code: 10159
- Location: Szentendre, Hungary
Re: CCRL 40/15 Testing Conditions (previously 40/40)
That surprises me. Is there no way to force a given clock speed any more? I have an i5-4690K and in the BIOS I selected a scheme which forces all cores to run at 3.5 GHz whatever the load.Ray wrote: ↑Wed Dec 21, 2022 4:55 pmClock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ish
Re: CCRL 40/15 Testing Conditions (previously 40/40)
Could be, but when I'm not running chess that boost is worth having.
Re: CCRL 40/15 Testing Conditions (previously 40/40)
I've spent multiple happy(?) days in a fiddle-with-BIOS-settings/reboot loop to optimize clock settings, using a program called y-cruncher (http://www.numberworld.org/y-cruncher/records.html) to apply compute load. I was surprised to find - on my computer anyway - that lowering the target clock speed resulted in faster execution times. I believe it is no longer possible to force a flat clock rate with modern CPUs, despite how the BIOS settings tool describes it. The controller reserves the option of lowering the clock as needed -- to save power, for one, but also prevent over-heating -- and there is nothing you can do to prevent that. I am sure the designers reason backing off on speed is better than frying a circuit. In the opposite direction, "turbo boosting" the clock of one or two previously idle CPUs when under light load is a clever way of having the computer be more responsive to human interaction.Gabor Szots wrote: ↑Wed Dec 21, 2022 5:07 pmThat surprises me. Is there no way to force a given clock speed any more? I have an i5-4690K and in the BIOS I selected a scheme which forces all cores to run at 3.5 GHz whatever the load.Ray wrote: ↑Wed Dec 21, 2022 4:55 pmClock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ish
I imagine most CCRL participants are Windows users. But for those on Linux I'll mention the program "i7z". It is handy for monitoring clock rates live. Below is a snapshot of its report with 46/48 CPUs running chess (to save space only showing CPU socket 0).
Code: Select all
Cpu speed from cpuinfo 2999.00Mhz
True Frequency (without accounting Turbo) 2999 MHz
Socket [0] - [physical cores=24, logical cores=48, max online cores ever=24]
CPU Multiplier 30x || Bus clock frequency (BCLK) 99.97 MHz
TURBO ENABLED on 24 Cores, Hyper Threading ON
Max Frequency without considering Turbo 3098.97 MHz (99.97 x [31])
Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 40x/38x/37x/37x/37x/37x
Real Current Frequency 3272.63 MHz (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp VCore
Core 1 [0]: 3097.92 (30.99x) 7.05 92.7 0 0 64 0.8853
Core 2 [1]: 3098.30 (30.99x) 24.4 73.8 0 1 65 0.8931
Core 3 [2]: 3101.40 (31.02x) 100 0 0 0 64 0.9009
Core 4 [3]: 3120.47 (31.22x) 1 100 0 0 66 0.9001
Core 5 [4]: 3091.99 (30.93x) 1 99.9 0 0 65 0.9152
Core 6 [5]: 3098.15 (30.99x) 99.9 0 0 0 69 0.9148
Core 7 [6]: 3096.52 (30.98x) 1 99.9 0 0 66 0.9301
Core 8 [7]: 3099.79 (31.01x) 78.2 0 0 20 62 0.9005
Core 9 [8]: 3097.93 (30.99x) 100 0 0 0 63 0.8931
Core 10 [9]: 3098.11 (30.99x) 75.5 0 0 24 67 0.8999
Core 11 [10]: 3100.48 (31.02x) 100 0 0 0 66 0.9146
Core 12 [11]: 3272.63 (32.74x) 1 99.8 0 0 65 0.9154
Core 13 [12]: 3121.15 (31.22x) 22.4 14.3 0 62.4 59 0.8795
Core 14 [13]: 3098.17 (30.99x) 100 0 0 0 64 0.8860
Core 15 [14]: 3175.22 (31.76x) 1 100 0 0 64 0.8934
Core 16 [15]: 3099.10 (31.00x) 1 99.2 0 0 67 0.9001
Core 17 [16]: 3097.93 (30.99x) 100 0 0 0 64 0.9154
Core 18 [17]: 3098.28 (30.99x) 100 0 0 0 64 0.9154
Core 19 [18]: 3097.99 (30.99x) 100 0 0 0 65 0.9154
Core 20 [19]: 3161.58 (31.63x) 1 99.9 0 0 63 0.8931
Core 21 [20]: 3167.64 (31.69x) 1 100 0 0 66 0.9005
Core 22 [21]: 3097.93 (30.99x) 100 0 0 0 66 0.9146
Core 23 [22]: 3105.50 (31.07x) 1 99.9 0 0 67 0.9146
Core 24 [23]: 3097.94 (30.99x) 100 0 0 0 65 0.9301
C6 = Everything in C3 + core state saved to last level cache
Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
Code: Select all
Socket [0] - [physical cores=24, logical cores=48, max online cores ever=24]
CPU Multiplier 30x || Bus clock frequency (BCLK) 99.97 MHz
TURBO ENABLED on 24 Cores, Hyper Threading ON
Max Frequency without considering Turbo 3098.97 MHz (99.97 x [31])
Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is 40x/38x/37x/37x/37x/37x
Real Current Frequency 3659.91 MHz (Max of below)
Core [core-id] :Actual Freq (Mult.) C0% Halt(C1)% C3 % C6 % Temp VCore
Core 1 [0]: 3593.56 (35.95x) 3.85 31.6 0 63.8 58 0.9832
Core 2 [1]: 3369.81 (33.71x) 1 100 0 0 63 0.8940
Core 3 [2]: 3544.41 (35.46x) 1 0.687 0 99.3 57 0.8726
Core 4 [3]: 3607.31 (36.09x) 1 1.95 0 97.7 56 0.8726
Core 5 [4]: 3287.66 (32.89x) 0 0.624 0 99.4 58 0.8799
Core 6 [5]: 3486.87 (34.88x) 99.8 0 0 0 66 0.9154
Core 7 [6]: 3524.83 (35.26x) 1 5.28 0 94.5 58 0.9021
Core 8 [7]: 3359.14 (33.60x) 1 100 0 0 65 0.9005
Core 9 [8]: 3489.47 (34.91x) 82.1 0 0 12.5 62 0.8934
Core 10 [9]: 3475.40 (34.77x) 18.1 5.86 0 73.2 58 0.8722
Core 11 [10]: 3492.63 (34.94x) 24.3 0 0 72.7 60 0.8792
Core 12 [11]: 3659.91 (36.61x) 1 25.7 0 73.6 60 0.8789
Core 13 [12]: 3522.23 (35.23x) 99.7 0 0 0 64 0.9160
Core 14 [13]: 3486.85 (34.88x) 99.8 0 0 0 64 0.8860
Core 15 [14]: 3575.79 (35.77x) 1 100 0 0 66 0.9978
Core 16 [15]: 3437.66 (34.39x) 1 5.6 0 94.2 58 0.9979
Core 17 [16]: 3486.86 (34.88x) 99.8 0 0 0 65 0.9156
Core 18 [17]: 3618.64 (36.20x) 1 4.05 0 95.2 57 1.0055
Core 19 [18]: 3488.67 (34.90x) 99.8 0 0 0 65 0.9156
Core 20 [19]: 3472.98 (34.74x) 1 99.8 0 0 62 0.8940
Core 21 [20]: 3213.99 (32.15x) 1 3.74 0 96.2 57 0.9983
Core 22 [21]: 3486.89 (34.88x) 99.8 0 0 0 65 0.9156
Core 23 [22]: 3486.88 (34.88x) 99.8 0 0 0 66 0.9152
Core 24 [23]: 3534.81 (35.36x) 99.4 0 0 0 67 1.0420
It's not just clock rate that affects benchmark results when running under full or near-full load. The flock of simultaneous engines increase memory contention too. Memory contention might be the largest contributor to observed slowdown.
Re: CCRL 40/15 Testing Conditions (previously 40/40)
I have a couple extra questions that I have not been able to find answers for, and which I think wouldn't hurt to put on public record.
i) When running BayesElo, what values are used for anchoring the rating lists?
ii) Was anchoring to FIDE rating lists performed, and if so, what historical notes do you have on that procedure?
In the talkchess thread "a direct comparison of FIDE and CCRL rating systems" a person with the handle drj4759 asserts that "Shredder 12 x64 wtih ELO 2800 was used as the anchor in all the different rating list presentation." Is that the case? That's not an unreasonable anchor point, but I cannot find confirmation, and I don't get the impression that drj4759 is one of the CCRL principals. Graham Banks participated in the thread but only weighed in on the CCRL machine calibration procedure.
i) When running BayesElo, what values are used for anchoring the rating lists?
ii) Was anchoring to FIDE rating lists performed, and if so, what historical notes do you have on that procedure?
In the talkchess thread "a direct comparison of FIDE and CCRL rating systems" a person with the handle drj4759 asserts that "Shredder 12 x64 wtih ELO 2800 was used as the anchor in all the different rating list presentation." Is that the case? That's not an unreasonable anchor point, but I cannot find confirmation, and I don't get the impression that drj4759 is one of the CCRL principals. Graham Banks participated in the thread but only weighed in on the CCRL machine calibration procedure.
Re: CCRL 40/15 Testing Conditions (previously 40/40)
Categorically no, our ratings lists are not anchored to Shredder 12. And categorically no, they are not directly anchored to any FIDE ratings.jkominek wrote: ↑Thu Jan 05, 2023 9:11 pm I have a couple extra questions that I have not been able to find answers for, and which I think wouldn't hurt to put on public record.
i) When running BayesElo, what values are used for anchoring the rating lists?
ii) Was anchoring to FIDE rating lists performed, and if so, what historical notes do you have on that procedure?
In the talkchess thread "a direct comparison of FIDE and CCRL rating systems" a person with the handle drj4759 asserts that "Shredder 12 x64 wtih ELO 2800 was used as the anchor in all the different rating list presentation." Is that the case? That's not an unreasonable anchor point, but I cannot find confirmation, and I don't get the impression that drj4759 is one of the CCRL principals. Graham Banks participated in the thread but only weighed in on the CCRL machine calibration procedure.
Back in late 2006 we chose a basket of 14 engines from the SSDF ratings list dated 24th November 2006 and got the average value of those. We run bayeselo on our database, compare the average of those 12 engines per the default (zero-based?) bayeselo calculation, and increment the ratings for all engines by that fixed difference. (40/15 and blitz lists)
Subsequently some years later we took the view that the ratings looked high, and reduced by 100 Elo.
It has been said that SSDF back then was supposedly reasonably representative of human ratings, so indirectly our lists *might* have some correlation to human ratings, but that is a big stretch and I definitely would not be making that statement.
The other complication is that bayeselo and Ordo both give very different ratings from the same database with the same anchor.