Could be we have a problem in our LSF support - none of us have a way
of testing it, so this is somewhat of a blind programming case for us.
From the message, it looks like there is some misunderstanding about
how many slots were allocated vs how many were mapped to a specific
host. I don't see your cmd line here - could you pass it along too?
My initial guess is that mpirun is running on node0023, and that we
then mapped procs local to mpirun such that we exceeded LSF's slot
allocation on that node. We don't account for mpirun taking a process
slot in our mapping, and LSF does - hence the error. I think...
You could test this by adding --nolocal to your cmd line. This will
force mpirun to map all procs on other nodes. If my analysis is
correct, the job should run.
On Feb 20, 2009, at 6:46 AM, Gabriele Fatigati wrote:
> Dear OpenMPi developers,
> i'm running my MPI code compiled with OpenMPI 1.3 over Infiniband and
> LSF scheduler. But i got the error attached. I suppose that spawning
> process doesn't works well. The same program under OpenMPI 1.2.5 works
> well. Could you help me?
> Thanks in advance.
> Ing. Gabriele Fatigati
> Parallel programmer
> CINECA Systems & Tecnologies Department
> Supercomputing Group
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> www.cineca.it Tel: +39 051 6171722
> g.fatigati [AT] cineca.it
> users mailing list