Rather than doing something practical I have done some more theorethical optimizing.

When computing black to move (BTM) positions from white to move (WTM) positions you need the following info:

White wins with distance to zeroing (DTZ) = N-1, white wins with DTZ = N-2, white wins with DTZ < N.

Given that info it is possible to compute new black losses for DTZ = N and DTZ = N-1.

To compute new WTM wins from BTM positions you need the recently computed black losses + previous white wins.

Given that info it is possible to compute new white wins with DTZ = N and DTZ = N+1.

Increase N by 2 for next iteration.

If you index using the kings (KK) positions as most signifigant index and then the rest of the pieces have 64^5 possibilities. (Actually less, due to illegal positions and sometimes due to mirroring)

This means that for each KK position there are 1G positions of the other pieces.

When computing the new values for one side, one king can be constant and so used to partition the table into smaller parts. Fix it in one of the squares, a1, b1, c1, d1, b2, c2, d2, c3, d3, d4.

The king that is moving can move at most one step, first load a1..h1, a2..h2 and a3..c3. This is 19*1G positions.Then compute a1..h1 and a2..b2.

After that a1 is not needed and can be replaced by d3. Then c2 can be computed and so on for all squares.

Without compression 3 bits per position is needed when computing BTM positions and 2 bits per position when computing WTM positions (because you only need the the WTM for the positions currently being computed).

The WTM positions can be compressed because there are only 4 possible cases:

win at DTZ < N

win at DTZ = N

win at DTZ = N+1

no win

If you store that with XY as 00 for no win, 10 for DTZ = N+1, 01 for DTZ = N and 11 for DTZ < N, you can compute the 3 win cases by simple logic, X&Y, Y&~X and X&~Y.

That way you need 2 bits per position and so 2*19*1G bits = 38 Gbit < 5 GB leaving a fair amount for result, I/O buffers and operating system if you have 8G.

What do you think, will this simplify computing 7-man tablebases?