Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OMPI as a batch system
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-12-08 14:21:56

On Tue, Dec 8, 2009 at 12:01 PM, Ross Boylan <ross_at_[hidden]> wrote:

> What is the difference between running a set of programs with
> independent invocations of mpirun vs specifying --app? The programs do
> not need to talk to each other.
> I think that if one job fails it will take the others down if I use
> --app. Is that correct? This is the main reason I'm considering
> alternatives.

Yes - the job is terminated in that situation.

> On the other hand, if my app file is something like
> -np 1 prog1
> -np 1 prog2
> ...
> I believe I will avoid oversubcription. But, if I do
> mpirun -np 1 prog1
> miprun -np 1 prog2
> ....
> do I end up oversubscribing the first node?
Yes - each invocation of mpirun has no idea what the other one is doing. So
they will both load their procs beginning with the first available node.

> It would also be nice if OMPI automatically picked the least loaded node
> (the load might come from other programs), but it does not appear it
> takes this into account. Is that right? The FAQ mentions load leveler,
> but we don't seem to have it installed.
Can you update to 1.3.4? If so, you can level the loading by using
--loadbalance on the cmd line and OMPI will map your procs accordingly.

> Context: we have a cluster without a batch system or scheduler, and want
> to run multiple independent jobs at once. The cluster is running Debian
> Lenny -> OMPI 1.2.7rc2.

We have a subproject called Open Resilient Cluster Manager that will allow
the job to continue when individual procs die. Not released yet, but you can
see the project at

I have used those techniques to modify mpirun to support process
continuation (to be committed to the devel trunk soon, for release later),
but the MPI connection restoration is still being worked. So it works fine
for non-MPI applications, but won't help for MPI apps right now.

I will probably modify mpirun at the same time to allow independent jobs to
continue running if one job fails. This will require a flag to mpirun,
though, as otherwise it would be very hard for me to know that the jobs are
in fact independent - the runtime layer doesn't know what MPI connections
are being made.


> Thanks for any help you can offer.
> Ross Boylan
> _______________________________________________
> users mailing list
> users_at_[hidden]