7-man EGTB Bounty Reborn - General Discussion

Endgame analysis using tablebases, EGTB generation, exchange, sharing, discussions, etc..
Post Reply
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

7-man EGTB Bounty Reborn - General Discussion

Post by Kirill Kryukov »

I decided to resurrect the bounty idea, and to make it happen this time. This thread is for general discussion, there will be separate threads for details, including project structure, metrics, funding, infrastructure, etc.

Please check Previous discussion, from 2008. Very few things happened in this field in the last 3 years, so most of that discussion is still very relevant.

In the next few days I will draft a detailed proposal for this project.

EDIT: It's now official. The Bounty project is set up at FOSS Factory, and I already donated my hard earned 10 bucks to see that it works.

EDIT: The Summary thread will contain the up-to-date version of the proposal and links to other resouces and discussions.
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: 7-men EGTB Bounty Reborn - General Discussion

Post by koistinen »

Maybe specify a benchmark computer exactly would be good, what processor, motherboard, memory and disks are connected.
Because 4 GB(gigabytes) memory modules now are reasonably priced, allowing a requirement of 24 GB at 10-20 GB/s for generating computer should be ok. (I would not need more than 8GB anyway)
6*2TB-disks would also be good. Lots of disks will be needed to store the results anyway. I would value sustained transfer rate here. Allow at most 1TB to be used for generation, leaving most for storing generated tables.
Then 6MB of onchip cashe, I don't know, for me it does not matter much.
Same about number of cores allowed, 1 or 12, does not matter much to me.
Set computation time allowed at some multiplier (2-10?) of what my or HGMs algorithm would require if implemented well.
At 10 it would fill a 2TB disk with DTZ50 data in roughly a month. (Assuming generation takes 12 hours/7-man and not much compression gives 300 GB/tablebase.)
This is also assuming disks that are cheap wrt cost per byte.
...
Profit 10 $)
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: 7-men EGTB Bounty Reborn - General Discussion

Post by koistinen »

A comment on 7-men EGTB Bounty Reborn - Summary.

While requiring that probing code be available in C is sensible, having programming language requirements for the generator might be harmful. Allow any language as long as there are no dependencies on nonfree code.
Readability requirements are fine, but it might make sense to write the generator in another language, say Python, perhaps with some generated C code for performance.

In summary: When possible, avoid having a particular programming language as requirement, Allow developers to chose the best language for the job.
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: 7-men EGTB Bounty Reborn - General Discussion

Post by Kirill Kryukov »

koistinen wrote:... having programming language requirements for the generator might be harmful. Allow any language as long as there are no dependencies on nonfree code.
Readability requirements are fine, but it might make sense to write the generator in another language, say Python, perhaps with some generated C code for performance.

In summary: When possible, avoid having a particular programming language as requirement, Allow developers to chose the best language for the job.
Well, I don't see a huge possibility of anyone seriously designing a 7-piece generator in anything other than C/C++. Although Yakov Konoval's generator was reportedly in assembly, but this should IMHO be rejected for non-portability. Perhaps some exotic language could make sense (D, Go, whatever). I'll outline why I believe C/C++ should be required:

1. The generator code should be maintainable by the community. After-release maintenance is a necessity for any complex piece of software. Who will maintain the generator written in exotic language, once the original developer loses interest? (Of course, using C/C++ by itself does not guarantee maintainability, just look at the Nalimov's code for example.)

2. Portability. Personally I'd argue for using just C and not C++, as the untimately portable solution. But C++ is probably close enough to be portable. Why portability is important: Distributed mass computation concept means that the community will run the generator on all availabe hardware. PCs, Macs, Linuxes must be fully supported. BSDs, Solaris and other systems might be running on some bigger iron, clusters, etc. It would be shame to waste all that computation power.

3. Suppose you choose a good looking language (like Java), and the next year it's parent company is bought by Oracle and it's specs are closed. With a long term project we don't want anything like this to happen.

4. You're right that the combination of compiled and interpreted code may provide some convenience. My current understanding is that the generator should be a pure crunching part, and that all support functionality belongs to the "infrastructure" subproject. Infrastructure is a huge task, that will have it's own specs, probably consisting of many smaller sub-projects. At the same time the bare generator can be specified and developed independently.

5. If the prospective developers believe that using some other language for the generator makes good sense, they are most welcome to step forward and initiate the discussion. If the community is convinced, the specs will be modified, allowing such generator to comply and win the bounty.

Any of these points is of course discussible, it's just my current understanding of the issue.
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: 7-men EGTB Bounty Reborn - General Discussion

Post by Kirill Kryukov »

koistinen wrote:Maybe specify a benchmark computer exactly would be good, what processor, motherboard, memory and disks are connected.
Because 4 GB(gigabytes) memory modules now are reasonably priced, allowing a requirement of 24 GB at 10-20 GB/s for generating computer should be ok. (I would not need more than 8GB anyway)
6*2TB-disks would also be good. Lots of disks will be needed to store the results anyway. I would value sustained transfer rate here. Allow at most 1TB to be used for generation, leaving most for storing generated tables.
Then 6MB of onchip cashe, I don't know, for me it does not matter much.
Same about number of cores allowed, 1 or 12, does not matter much to me.
Set computation time allowed at some multiplier (2-10?) of what my or HGMs algorithm would require if implemented well.
At 10 it would fill a 2TB disk with DTZ50 data in roughly a month. (Assuming generation takes 12 hours/7-man and not much compression gives 300 GB/tablebase.)
This is also assuming disks that are cheap wrt cost per byte.
...
Profit 10 $)
I agree that the performance specs should be unambiguous. It should be clear who will perform the verification, how long it will take, what machines are used, what tables will be computed. How exactly the specs should describe the performance - I'm still not sure. This is one area where the prospective developers should provide insight. IO will be the bottleneck probably. We can have terabytes of storage, 4 or more cores, but RAM will be always limited. I'd argue that the generator should not require SSD.

Probably two minimum machine requirements should be given: one for WDL/WDL50, another for distance-based metrics. The first one should be ideally as cheap as possible, as initially mass generation will use WDL/WDL50. The other one should be at least verifyable at the time of submission. At the moment I only have 16 GB of RAM, for example.
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by koistinen »

I think it would be nice if the demand for speed is lax as it is much easier to implement something when you are allowed to be half as fast as optimum, and more laxity gives more ease.
In my scheme, there is one 46.4 GiBy table that needs to be read twice and written once every iteration. With duplicate pieces etc (as is common for 7-man), it could be halved or more and if it fit in RAM, disk i/o could be seriously reduced. (with HGMs, even more I believe)
So, is there some generator speed that is "good enough"? Say, generating all 7-man for: DTZ50 in 6k days, DTZ in 20k days, DTC in 21k days, DTM in 50k days, DTM50 in 1M days. (Divide time by number of stock PCs of today you are allowed to use with 4 disks/computer, divide by 100 for 6-man etc.)
(WDL50 and WDL costs about as much to compute as DTZ50 and DTZ respectively and the info is part of the result anyway.)
tralala
Posts: 3
Joined: Mon Apr 25, 2011 10:55 pm
Sign-up code: 10159

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by tralala »

I like the idea and already sponsored the project with a symbolical amount.

I see two challenges:

1) You have to attract a programmer who likes the challenge.
2) You have to have the hardware necessary to produce the tables.
3) Distribution (which I don't discuss here because once the tables are there you get them distributed somehow)

The first challenge seems trickier than the second, because I think once the code is there (and bugfixed) you'll find people who are willing to let their powerful hardware do the rest. In order to intrigue a programmer one should give him all the design flexibility he wants and start with a realistic goal (which probably means WDL or bitbases). One can encourage a programmer with some money but let's be realistic the community won't raise much money on this. So why not reverse the process and ask the programmers under which conditions they would be interested to write such a generator?
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by Kirill Kryukov »

koistinen wrote:I think it would be nice if the demand for speed is lax as it is much easier to implement something when you are allowed to be half as fast as optimum, and more laxity gives more ease.
I think the question is: With the project that will run for years (generating the tables), do we want a generator that is only half as fast as optimum? Perhaps we could say "yes, because now we don't have any". Or may be instead we could aim for the fastest possible. The problem here is that we have no idea what is or is not possible, when the developers don't share their generator specs. So the idea is that some prospective developers will come and share their specs. (I know that for that the generator should be largely complete). Then, if the community is satisfied, and if there is no stronger claims, the bounty specs are updated and this generator can win.

Alternatively, we could set put slower performance into the bounty specs, and then the first solution will just win automatically. I'm not sure we should prefer this.
koistinen wrote:In my scheme, there is one 46.4 GiBy table that needs to be read twice and written once every iteration. With duplicate pieces etc (as is common for 7-man), it could be halved or more and if it fit in RAM, disk i/o could be seriously reduced. (with HGMs, even more I believe)
So, is there some generator speed that is "good enough"? Say, generating all 7-man for: DTZ50 in 6k days, DTZ in 20k days, DTC in 21k days, DTM in 50k days, DTM50 in 1M days. (Divide time by number of stock PCs of today you are allowed to use with 4 disks/computer, divide by 100 for 6-man etc.)
(WDL50 and WDL costs about as much to compute as DTZ50 and DTZ respectively and the info is part of the result anyway.)
This is a good question. I did not post because I was thinking about it, and I'm still not sure. I'll probably be satisfied with this performance, and if there is no interest from other developers, I'd be willing to put this speed into bounty specs (relaxed a bit for safety). Assuming there is no other choice. If there is other choice, then I'd try to aim for a faster-performing generator.
User avatar
Kirill Kryukov
Site Admin
Posts: 7399
Joined: Sun Dec 18, 2005 9:58 am
Sign-up code: 0
Location: Mishima, Japan
Contact:

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by Kirill Kryukov »

tralala wrote:I like the idea and already sponsored the project with a symbolical amount.

I see two challenges:

1) You have to attract a programmer who likes the challenge.
2) You have to have the hardware necessary to produce the tables.
3) Distribution (which I don't discuss here because once the tables are there you get them distributed somehow)

The first challenge seems trickier than the second, because I think once the code is there (and bugfixed) you'll find people who are willing to let their powerful hardware do the rest.
Great, thanks for joining!

I agree that the first challenge (attracting a programmer) is the hardest. I am very optimistic about generating and distributing the tables - I think the community has enough manpower to solve these problems nicely.
tralala wrote:In order to intrigue a programmer one should give him all the design flexibility he wants and start with a realistic goal (which probably means WDL or bitbases). One can encourage a programmer with some money but let's be realistic the community won't raise much money on this. So why not reverse the process and ask the programmers under which conditions they would be interested to write such a generator?
The programmers are always welcome to join the discussion and bargain with the community. At least the way that I see it, any input from the developers would be very important contribution for the direction of this project.

Thinking about the possiblity of just requesting WDL (and WDL50) in the specs, I just don't think it's exciting enough to attract many people (to support this project). Particularly when we already know that it's possible to solve 7-piece endgames in DT* metrics.
koistinen
Posts: 92
Joined: Fri May 02, 2008 7:59 pm
Sign-up code: 0
Location: Stockholm

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by koistinen »

Kirill Kryukov wrote: I'd be willing to put this speed into bounty specs (relaxed a bit for safety). Assuming there is no other choice. If there is other choice, then I'd try to aim for a faster-performing generator.
What is the performance of the currently available 6-man generators on current hardware when the table fit in RAM?

Update: On the Gaviota site it says it takes about 30h to generate all 5-man "in faster quads". That is for DTM. Divide by 185 and multiply by 1876*64*64 to get about 50k days.
I find it funny how well the figure matches my previously suggested requirement, even though I computed it by estimating the speed I could do the generation (if I ever manage to implement my algorithm) and multiplying by 10 for relaxation of requirement (to make it more likely that someone implements a generator).
Arpad Rusz
Posts: 93
Joined: Mon Mar 27, 2006 5:33 pm
Sign-up code: 0
Location: Romania/Hungary
Contact:

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by Arpad Rusz »

XY has generated a 100GB tablebase (60GB WTM + 40GB BTM) in 20 days and wants to share it with you. What to do?
a.Download the whole 100GB = 10 days
b.Download only the 40GB BTM file (4 days) + generate the WTM file yourself (12 days) = 16 days (but less internet traffic than with a.)
c.Download a bitbase of 10GB (1 day) + generate the 100GB tablebase yourself (18 days) = 19 days
d.Download a 5GB special file prepared by XY which can be seeded into the generator (1/2 day) + generate the 100GB tablebase yourself (9 days) = 9.5 days :!:
User avatar
jshriver
Posts: 298
Joined: Tue Jan 31, 2006 5:59 am
Sign-up code: 0
Location: Toledo, OH, USA
Contact:

Re: 7-man EGTB Bounty Reborn - General Discussion

Post by jshriver »

I agree the generator and example probing code should be in C. It's robust, fast, and you can pretty much write a wrapper in any other language.

If I may suggest, I think the file format should be one of the first things that needs tackled. Because even if we can get a hodge podge generator running that conforms to that format, we can spend months generating at least something, while fine tuning or making the generator better.

Honestly believe if we wait till we have a perfect generator it'll never get done. I vote for allowing some ease to the target developer as long as a community we can describe at least some of the core goals. I agree 100% with Kirill's suggestions so far.
Post Reply