Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/pbsdsh/Torque problem
From: Laurence Marks (L-marks_at_[hidden])
Date: 2011-04-03 18:37:30


On Sun, Apr 3, 2011 at 5:08 PM, Reuti <reuti_at_[hidden]> wrote:
> Am 03.04.2011 um 23:59 schrieb David Singleton:
>
>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>
>>> What I still don't understand is why you are trying to do it this way. Why not just run
>>>
>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
>>>
>>> where machineN contains the names of the nodes where you want the MPI apps to execute? mpirun will only execute apps on those nodes, so this accomplishes the same thing as your script - only with a lot less pain.
>>>
>>> Your script would just contain a sequence of these commands, each with its number of procs and machinefile as required.
>>>
>>
>> Maybe I missed why this suggestion of forgetting about the ssh/pbsdsh altogether
>> was not feasible?  Just use mpirun (with its great tm support!) to distribute
>> MPI jobs.
>
> Wien2k has a two stage startup, e.g. for 16 slots:
>
> a) start 4 times `ssh` in the background to go to some of the granted nodes
> b) use there on each node `mpirun` to start processes on the remaining nodes, 3 for each call
>
> Problems:
>
> 1) control `ssh` under Torque
> 2) provide a partially hostlist to `mpirun`, maybe by disabling the default tight integration
>
> -- Reuti
>

1) The mpi tasks can be started on only one node (Reuti, "setenv
MPI_REMOTE 0" in parallel_options which was introduced for other
reasons in 9.3 and later releases). That seems to be safe and maybe
the only viable method with OMPI as pbsdsh appears to be unable to
launch mpi tasks correctly (or needs some environmental variables that
I don't know about).
2) This is already done (Reuti, this is .machine0, .machine1 etc. If
you need information about setting up the Wien2k file under qsub in
general, contact me offline or look for Machines2W on the mailing list
which may be part of the next release, I'm not sure and I don't make
those decisions).

However, there is another layer that Ruedi did not mention for this
code which is that some processes also need to be remotely launched to
ensure that the correct scratch directories are used (i.e. local
storage which is faster rather than nfs or similar). Maybe pbsdsh can
be used for this, I am still testing and I am not sure. It may be
enough to create a script with all important environmental variables
exported (as they may not all be in .bashrc or .cshrc) although there
might be issues making this fully portable. Since there are > 1000
licenses of Wien2k, it has to be able to cope with different OS's, and
not just OMPI.

>
>> A simple example:
>>
>> vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I
>> qsub: waiting for job 574900.vu-pbs to start
>> qsub: job 574900.vu-pbs ready
>>
>> [dbs900_at_v250 ~/MPI]$ wc -l $PBS_NODEFILE
>> 24
>> [dbs900_at_v250 ~/MPI]$ head -12 $PBS_NODEFILE > m1
>> [dbs900_at_v250 ~/MPI]$ tail -12 $PBS_NODEFILE > m2
>> [dbs900_at_v250 ~/MPI]$ mpirun --machinefile m1 ./a2a143 120000 30 & mpirun --machinefile m2 ./pp143
>>
>>
>> Check how the processes are distributed ...
>>
>> vayu1:~ > qps 574900.vu-pbs
>> Node 0: v250:
>>  PID S   RSS    VSZ %MEM     TIME %CPU COMMAND
>> 11420 S  2104  10396  0.0 00:00:00  0.0 -tcsh
>> 11421 S   620  10552  0.0 00:00:00  0.0 pbs_demux
>> 12471 S  2208  49324  0.0 00:00:00  0.9 /apps/openmpi/1.4.3/bin/mpirun --machinefile m1 ./a2a143 120000 30
>> 12472 S  2116  49312  0.0 00:00:00  0.0 /apps/openmpi/1.4.3/bin/mpirun --machinefile m2 ./pp143
>> 12535 R 270160 565668  1.0 00:00:02 82.4 ./a2a143 120000 30
>> 12536 R 270032 565536  1.0 00:00:02 81.4 ./a2a143 120000 30
>> 12537 R 270012 565528  1.0 00:00:02 87.3 ./a2a143 120000 30
>> 12538 R 269992 565532  1.0 00:00:02 93.3 ./a2a143 120000 30
>> 12539 R 269980 565516  1.0 00:00:02 81.4 ./a2a143 120000 30
>> 12540 R 270008 565516  1.0 00:00:02 86.3 ./a2a143 120000 30
>> 12541 R 270008 565516  1.0 00:00:02 96.3 ./a2a143 120000 30
>> 12542 R 272064 567568  1.0 00:00:02 91.3 ./a2a143 120000 30
>> Node 1: v251:
>>  PID S   RSS    VSZ %MEM     TIME %CPU COMMAND
>> 10367 S  1872  40648  0.0 00:00:00  0.0 orted -mca ess env -mca orte_ess_jobid 1444413440 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "1444413440.0;tcp://10.1.3.58:37339"
>> 10368 S  1868  40648  0.0 00:00:00  0.0 orted -mca ess env -mca orte_ess_jobid 1444347904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610"
>> 10372 R 271112 567556  1.0 00:00:04 74.5 ./a2a143 120000 30
>> 10373 R 271036 567564  1.0 00:00:04 71.5 ./a2a143 120000 30
>> 10374 R 271032 567560  1.0 00:00:04 66.5 ./a2a143 120000 30
>> 10375 R 273112 569612  1.1 00:00:04 68.5 ./a2a143 120000 30
>> 10378 R 552280 840712  2.2 00:00:04 100 ./pp143
>> 10379 R 552280 840708  2.2 00:00:04 100 ./pp143
>> 10380 R 552328 841576  2.2 00:00:04 100 ./pp143
>> 10381 R 552788 841216  2.2 00:00:04 99.3 ./pp143
>> Node 2: v252:
>>  PID S   RSS    VSZ %MEM     TIME %CPU COMMAND
>> 10152 S  1908  40780  0.0 00:00:00  0.0 orted -mca ess env -mca orte_ess_jobid 1444347904 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610"
>> 10156 R 552384 840200  2.2 00:00:07 99.3 ./pp143
>> 10157 R 551868 839692  2.2 00:00:06 99.3 ./pp143
>> 10158 R 551400 839184  2.2 00:00:07 100 ./pp143
>> 10159 R 551436 839184  2.2 00:00:06 98.3 ./pp143
>> 10160 R 551760 839692  2.2 00:00:07 100 ./pp143
>> 10161 R 551788 839824  2.2 00:00:07 97.3 ./pp143
>> 10162 R 552256 840332  2.2 00:00:07 100 ./pp143
>> 10163 R 552216 840340  2.2 00:00:07 99.3 ./pp143
>>
>>
>> You would have to do something smarter to get correct process binding etc.
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Laurence Marks
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Northwestern University
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Web: www.numis.northwestern.edu
Chair, Commission on Electron Crystallography of IUCR
www.numis.northwestern.edu/
Research is to see what everybody else has seen, and to think what
nobody else has thought
Albert Szent-Gyorgi