Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] mpirun interaction with pbsdsh
From: Brock Palen (brockp_at_[hidden])
Date: 2009-04-01 10:49:31


Ok this is weird, and the correct answer is probably "don't do that",
Anyway:

User wants to run many many small jobs, faster than our scheduler
+torque can start, he uses pbsdsh to start them in parallel, under tm.

pbsdsh bash -c 'cd $PBS_O_WORKDIR/$PBS_VNODENUM; mpirun -np 1
application'

This is kinda silly because the code while MPI based, when ran on
single rank does not require mpirun to start, and just just fine if
you leave off mpirun.

What happens though if you do leave it on (this is with ompi-1.2.x)
you get errors about

[nyx428.engin.umich.edu:01929] pls:tm: failed to poll for a spawned
proc, return status = 17002
[nyx428.engin.umich.edu:01929] [0,0,0] ORTE_ERROR_LOG: In errno in
file rmgr_urm.c at line 462

Kinda makes sense, pbsdsh has already started 'mpirun' under tm, and
now mpirun is trying to start a process also under tm. In fact with
older versions (1.2.0). The above will work fine only for the first
TMNODE, any second node, will hang, at 'poll()' if you strace it.

To we can solve the above by not using mpirun to start single
processes under tm that were spawned by tm in the first place. Just
thought you would like to know.

Is there a way to have mpirun spawn all the processes like pbsdsh?
Problem is the code is MPI based, so if you say 'run 4' its going to
do the noraml COMM_SIZE=4, only read first input, etc. Also we have
to change the CWD of each rank. Thus can you make mpirun farm?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp_at_[hidden]
(734)936-1985