MOSIX works as a sandbox, wrapping the executed process. Suppose I run
with "-n 3": three processes will be launched via MOSIX on nodes A, B
and C. MOSIX can choose to "migrate" process #2 from B to D - this will
not restart the process, nor will the process know about it's current
location unless it "asks" by reading /proc/mosix/mosip for example. The
process will run on D (and consume CPU and memory on D), but it'll think
it's still on B and most system-calls will still be executed on B. This
is, of course, better for CPU-intensive apps then i/o-intensive ones...
Since MPI would qualify as "communication-intensive", I've prepared a
special BTL component for it. You don't have to use the BTL to run with
MOSIX - ODLS is enough, but it'll give you reduced communication
performance. MPI runs as usual (with the slight performance penalty) -
no processes added/removed so no re-wiring...
I'll be happy to elaborate if you're interested.
On 03/31/2012 10:29 PM, Ralph Castain wrote:
> I can't speak to the BTL itself, but I do have questions as to how this can work. If MOSIX migrates a process, or starts new processes on another node during the course of a job, there is no way for MPI to handle the wireup and so it will fail. We need ALL the procs started at the beginning of time, and for them to remain in their initial location throughout the job. There are people working on how to handle proc movement, but mostly from a fault recovery perspective - i.e., the process is already known and wired, but fails and restarts at a new location, so we can try to re-wire it.
> I've looked at MOSIX before for other folks (easy enough to fork/exec a proc), but could find no real way to support the way MOSIX wants to manage resources without the constraint that MOSIX only operate at a job level - i.e., it start all specified procs at the beginning of time, and it not migrate them. Kinda defeated the intent of MOSIX.
> On Mar 31, 2012, at 10:04 AM, Alex Margolin wrote:
>> I think i'm close to finishing an initial version of the MOSIX support for open-mpi. A perliminary draft is attached.
>> The support consists of two modules: ODLS module for launching processes under MOSIX, and BTL module for efficient communication between processes.
>> I'm not quite there yet - I'm sure the BTL module needs more work... first because it fails (see error output below) and second because I'm not sure I got all the function output right. I've written some documentation inside the code, which is pretty short at the moment. The ODLS component is working fine.
>> Is it possible someone will take a look at my code to see if i'm in the right direction? I would like to submit my code to the repository eventually... I know of quite a few open-mpi users interested in MOSIX support (they know I'm working on it), and I was hoping to publish some benchmark results for it at the upcoming EuroMPI.
> devel mailing list