On Sep 28, 2011, at 10:04 AM, George Bosilca wrote:
>> Why not use pre-posted non-blocking receives and MPI_WAIT_ANY?
> That's not very scalable either
Might work for 256 processes, but that's about it.
Just get a machine with oodles of RAM and you'll be fine.
I actually was thinking of his specific 256-process case. I agree that it doesn't scale arbitrarily.
Another approach would potentially be to break your 256 processes up into N sub-communicators of M each (where N * M = 256, obviously), and doing a doing a non-blocking receive with ANY_SOURCE and then a WAIT_ANY on all of those.
The code gets a bit more complex, but it hypothetically extends your scalability.
Or better yet, have your job mimic this idea -- a tree-based gathering system. Have not just 1 master, but N sub-masters. Individual compute processes report up to their sub-master, and the sub-master does whatever combinatorial work it can before reporting it to the ultimate master, etc.
It depends on your code and how much delegation is possible, how much data you're transferring over the network, how much fairness you want to guarantee, etc. My point is that there are a bunch of different options you can pursue outside of the "everyone sends to 1 master" model.
For corporate legal information go to: