To some degree it is true that not all engines are equal in endgame, but this does not dramatically harm my proposal. Many of the 6-man endings are so imbalanced that most 1200elo players can avoid botching if they if they can recognise a stalemate before it occurs. Consider that in 5-1 tables it is a lone king verses five men. Any chess engine that can't win that is profoundly weak unless the condition resulted from a forcing series of moves leading to stalemate immediately but that too should be easily seen by most chess engines.
It is very easy to get burned by probability. There are so many endgames possible that there is a good chance oddball endgames are going to crop up. The same one several times may be rare, but collectively the probability that rare ones are reached becomes high. Thus if you collect strictly by probability and are limited to 200 or 300Gb, you could still get burned a lot.
On the other had if you collected on the basis of a botch percentage the ones you left out are less likely to cost you points.
I looked for such a list a few months ago when I decided to start downloading 6-man tables. I only found the probability of occurrence lists. In the absence of what I wanted to go by, I just considered each material balance and made an educated guess as to what each side's plan would be and how much resistance is possible. I tried to error on the side of caution including more tables. I am only 1900elo or so and only spent 10 seconds or so considering each scenario, so no doubt, a better list is possible but this is the list I am using for now. The tables are in no particular order of usefulness, it is just a list of tables that appeared on the surface to be useful--the ones not listed being of doubted utility. My starting point was the "all" lists on the Tablebase Sharing site
http://kd.lab.nig.ac.jp/chess/tablebases-online/ An International Master or Grandmaster could likely make improvements and I would welcome it. I will attach the list (The Breaker list) at the bottom.
A list made by humans could still have errors though realistically not many but it would be poor at ranking by utility which is important because not everyone has the same amount of space. If you do not have room for my listed table basses (The Breaker List) which ones do you omit? The ranking of utility is the most reasonable way to decide.
I think a direct multiplication of chance of occurrence by botching chance is not really the way to go because the cost of exclusion at the bottom of each list are not comparable. I would rather have all the tables that have any chance of botching rather than leaving something out just because it is highly improbable. Even with a several million game set we don't get an accurate picture at the bottom of the list of relative occurrence. Statistically, we can't really say with much certainty that one that appeared 2 or 3 times is really any more probable than one that occurred once or no times. To make such a judgment, you would need at least 30 of every type which may take several hundred million or a billion game database to reach.
Accurate ordering of botching risk is much easier to achieve. If 1,000 positions does not yield 30 botched, one can simply try 10,000 positions. If there are none, it is probably botchproof using that engine.
The actual method of calculation of the botch risk I propose would be as follows:
A chess engine is given 1,000 positions at random, it is given one second to reach a move choice. The choice is then compared with the table. Each position has a real evaluation from the table say for example mate in 15 and four moves that achieve that. If the engine suggests one of these four it gets a check and goes to the next. If it gets it wrong it either is suboptimal or a botch (changes the outcome assuming perfect play). If it is a botch, it gets a check for a botch and moves on to the next position. If it is suboptimal the number of moves off it is is recorded and placed in the category for the original length.
When we have all the data, we can then estimate the botch probability. The suboptimal moves can lead to a botch. To determine the chance of this a series of pseudo games can be run. This is the sort of thing a computer can do very fast.
The pseudo games would abide by the 50-move rule. One of the thousand positions would be taken at random and its outcome used i.e. if it was a mate in 35 and turned into a mate in 43 then it is replaced with a random mate in 43 that was one of the thousand positions already one second engine evaluated. This continues with table playing one side and the 1000 position engine result playing the other side. Positions are not actually played out because of the random replacement but the results should simulate it well and generate data fast.
In the list of results there should be each 6-man tablebase, engine used, botch per thousand, and double botch per thousand (turning a win not just to a draw but to a loss), time (I suggest one sec), and base set number (1,000--though it would be nice to have a larger number).
I would not disregard probability of occurrence entirely. The best plan is to include the top 20 or 30 by probability and then add whatever one has room for out of a botch chart starting with those most easily botched. Until this is generated go with a list like I have made.
A savy engine programmer might have his engine steer toward endgames that are low probability yet easily botched taking advantage of engines that are not equipped to handle them. It is hard to see how to exploit the engine equipped with all the tables of the positions it botches.