Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] openmpi/pbsdsh/Torque problem
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-04 10:50:39


Hmmm...yes, I guess we did get off-track then. This soln is exactly what I proposed on the first response to your thread, and was repeated by others later on. :-/

So long as mpirun is executed on the node where the "sister mom" is located, and as long as your script "B" does -not- include an "mpirun" cmd, this will work fine.

On Apr 4, 2011, at 8:38 AM, Laurence Marks wrote:

> Thanks, I think we may have a mistaken communication here; I assume
> that the computer where they have disabled rsh and ssh they have
> "something" to communicate with so we don't need to use pbsdsh. If
> they don't there is not much a lowly user like me can do.
>
> I think we can close this, since like many things the answer is
> "simple" when you find it and I think I have. Forget pbsdsh which
> seems to be a bit flakey and probably is not being maintained much.
> Instead, use mpirun to replace ssh. In other words replace
>
> ssh A B
>
> to execute command B on node A by
>
> mpirun -np 1 --host A bash -c " B "
>
> (with variables appropriately substituted, or with csh instead of
> bash). Then -x (in OMPI) can be used to export whatever is needed in
> the environment etc, which pbsdsh lacks, and there should be similarly
> environment exporting with other MPI. With whatever minor changes are
> needed for other flavors of MPI I believe this should be 99% robust
> and portable. This passes the simple test with B of "sleep 600" when
> terminating the process where the mpirun is launched kills the sleep
> on a remote node (unlike ssh on some but not all computers).
>
> On Mon, Apr 4, 2011 at 6:35 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>> I apologize - I realized late last night that I had a typo in my recommended command. It should read:
>>
>> mpirun -mca plm rsh -mca plm_rsh_agent pbsdsh -mca ras ^tm --machinefile m1....
>> ^^^^^^^^^^^^^^^^^^^
>>
>> Also, if you know that #procs <= #cores on your nodes, you can greatly improve performance by adding "--bind-to-core".
>>
>>
>>
>> On Apr 3, 2011, at 5:28 PM, Laurence Marks wrote:
>>
>>> And, before someone wonders, while Wien2k is a commercial code it is
>>> about 500 Eu for a lifetime licence so this is not the same as Vasp or
>>> Gaussian which cost $$$$$. And, I have no financial interest in the
>>> code, but like many others help make it better (semi gnu).
>>>
>>> On Sun, Apr 3, 2011 at 6:25 PM, Laurence Marks <L-marks_at_[hidden]> wrote:
>>>> Thanks. I will test this tomorrow.
>>>>
>>>> Many people run Wien2k with openmpi as you say, I only became aware of
>>>> the issue of Wien2k (and perhaps other codes) leaving orphaned
>>>> processes still running a few days ago. I also know someone who wants
>>>> to run Wien2k on a system where both rsh and ssh are banned.
>>>> Personally, as I don't want to be banned from the supercomputers I use
>>>> I want to find a adequate patch for myself --- and then try and
>>>> persuade the developers to adopt it.
>>>>
>>>> On Sun, Apr 3, 2011 at 6:13 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>
>>>>> On Apr 3, 2011, at 4:37 PM, Laurence Marks wrote:
>>>>>
>>>>>> On Sun, Apr 3, 2011 at 5:08 PM, Reuti <reuti_at_[hidden]> wrote:
>>>>>>> Am 03.04.2011 um 23:59 schrieb David Singleton:
>>>>>>>
>>>>>>>> On 04/04/2011 12:56 AM, Ralph Castain wrote:
>>>>>>>>>
>>>>>>>>> What I still don't understand is why you are trying to do it this way. Why not just run
>>>>>>>>>
>>>>>>>>> time mpirun -v -x LD_LIBRARY_PATH -x PATH -np 2 -machinefile .machineN /home/lma712/src/Virgin_10.1/lapw1Q_mpi lapw1Q_1.def
>>>>>>>>>
>>>>>>>>> where machineN contains the names of the nodes where you want the MPI apps to execute? mpirun will only execute apps on those nodes, so this accomplishes the same thing as your script - only with a lot less pain.
>>>>>>>>>
>>>>>>>>> Your script would just contain a sequence of these commands, each with its number of procs and machinefile as required.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe I missed why this suggestion of forgetting about the ssh/pbsdsh altogether
>>>>>>>> was not feasible? Just use mpirun (with its great tm support!) to distribute
>>>>>>>> MPI jobs.
>>>>>>>
>>>>>>> Wien2k has a two stage startup, e.g. for 16 slots:
>>>>>>>
>>>>>>> a) start 4 times `ssh` in the background to go to some of the granted nodes
>>>>>>> b) use there on each node `mpirun` to start processes on the remaining nodes, 3 for each call
>>>>>>>
>>>>>>> Problems:
>>>>>>>
>>>>>>> 1) control `ssh` under Torque
>>>>>>> 2) provide a partially hostlist to `mpirun`, maybe by disabling the default tight integration
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>
>>>>>> 1) The mpi tasks can be started on only one node (Reuti, "setenv
>>>>>> MPI_REMOTE 0" in parallel_options which was introduced for other
>>>>>> reasons in 9.3 and later releases). That seems to be safe and maybe
>>>>>> the only viable method with OMPI as pbsdsh appears to be unable to
>>>>>> launch mpi tasks correctly (or needs some environmental variables that
>>>>>> I don't know about).
>>>>>> 2) This is already done (Reuti, this is .machine0, .machine1 etc. If
>>>>>> you need information about setting up the Wien2k file under qsub in
>>>>>> general, contact me offline or look for Machines2W on the mailing list
>>>>>> which may be part of the next release, I'm not sure and I don't make
>>>>>> those decisions).
>>>>>>
>>>>>> However, there is another layer that Ruedi did not mention for this
>>>>>> code which is that some processes also need to be remotely launched to
>>>>>> ensure that the correct scratch directories are used (i.e. local
>>>>>> storage which is faster rather than nfs or similar). Maybe pbsdsh can
>>>>>> be used for this, I am still testing and I am not sure. It may be
>>>>>> enough to create a script with all important environmental variables
>>>>>> exported (as they may not all be in .bashrc or .cshrc) although there
>>>>>> might be issues making this fully portable. Since there are > 1000
>>>>>> licenses of Wien2k, it has to be able to cope with different OS's, and
>>>>>> not just OMPI.
>>>>>>
>>>>>
>>>>> Here is what I would do, based on my knowledge of OMPI's internals (and I wrote the launchers :-)):
>>>>>
>>>>> 1. do not use your script - you don't want all those PBS envars to confuse OMPI
>>>>>
>>>>> 2. mpirun -mca plm rsh -launch-agent pbsdsh -mca ras ^tm --machinefile m1....
>>>>>
>>>>> This cmd line tells mpirun to use the "rsh/ssh" launcher, but to substitute "pbsdsh" for "ssh". It also tells it to ignore the PBS_NODEFILE and just use the machinefile for the nodes to be used for that job.
>>>>>
>>>>> I can't swear this will work as I have never verified that pbsdsh and ssh have the same syntax, but I -think- that was true. If so, then this might do what you are attempting.
>>>>>
>>>>>
>>>>> I know people have run Wien2k with OMPI before - but I have never heard of the problems you are reporting.
>>>>>
>>>>>
>>>>>>>
>>>>>>>> A simple example:
>>>>>>>>
>>>>>>>> vayu1:~/MPI > qsub -lncpus=24,vmem=24gb,walltime=10:00 -wd -I
>>>>>>>> qsub: waiting for job 574900.vu-pbs to start
>>>>>>>> qsub: job 574900.vu-pbs ready
>>>>>>>>
>>>>>>>> [dbs900_at_v250 ~/MPI]$ wc -l $PBS_NODEFILE
>>>>>>>> 24
>>>>>>>> [dbs900_at_v250 ~/MPI]$ head -12 $PBS_NODEFILE > m1
>>>>>>>> [dbs900_at_v250 ~/MPI]$ tail -12 $PBS_NODEFILE > m2
>>>>>>>> [dbs900_at_v250 ~/MPI]$ mpirun --machinefile m1 ./a2a143 120000 30 & mpirun --machinefile m2 ./pp143
>>>>>>>>
>>>>>>>>
>>>>>>>> Check how the processes are distributed ...
>>>>>>>>
>>>>>>>> vayu1:~ > qps 574900.vu-pbs
>>>>>>>> Node 0: v250:
>>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND
>>>>>>>> 11420 S 2104 10396 0.0 00:00:00 0.0 -tcsh
>>>>>>>> 11421 S 620 10552 0.0 00:00:00 0.0 pbs_demux
>>>>>>>> 12471 S 2208 49324 0.0 00:00:00 0.9 /apps/openmpi/1.4.3/bin/mpirun --machinefile m1 ./a2a143 120000 30
>>>>>>>> 12472 S 2116 49312 0.0 00:00:00 0.0 /apps/openmpi/1.4.3/bin/mpirun --machinefile m2 ./pp143
>>>>>>>> 12535 R 270160 565668 1.0 00:00:02 82.4 ./a2a143 120000 30
>>>>>>>> 12536 R 270032 565536 1.0 00:00:02 81.4 ./a2a143 120000 30
>>>>>>>> 12537 R 270012 565528 1.0 00:00:02 87.3 ./a2a143 120000 30
>>>>>>>> 12538 R 269992 565532 1.0 00:00:02 93.3 ./a2a143 120000 30
>>>>>>>> 12539 R 269980 565516 1.0 00:00:02 81.4 ./a2a143 120000 30
>>>>>>>> 12540 R 270008 565516 1.0 00:00:02 86.3 ./a2a143 120000 30
>>>>>>>> 12541 R 270008 565516 1.0 00:00:02 96.3 ./a2a143 120000 30
>>>>>>>> 12542 R 272064 567568 1.0 00:00:02 91.3 ./a2a143 120000 30
>>>>>>>> Node 1: v251:
>>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND
>>>>>>>> 10367 S 1872 40648 0.0 00:00:00 0.0 orted -mca ess env -mca orte_ess_jobid 1444413440 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri "1444413440.0;tcp://10.1.3.58:37339"
>>>>>>>> 10368 S 1868 40648 0.0 00:00:00 0.0 orted -mca ess env -mca orte_ess_jobid 1444347904 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610"
>>>>>>>> 10372 R 271112 567556 1.0 00:00:04 74.5 ./a2a143 120000 30
>>>>>>>> 10373 R 271036 567564 1.0 00:00:04 71.5 ./a2a143 120000 30
>>>>>>>> 10374 R 271032 567560 1.0 00:00:04 66.5 ./a2a143 120000 30
>>>>>>>> 10375 R 273112 569612 1.1 00:00:04 68.5 ./a2a143 120000 30
>>>>>>>> 10378 R 552280 840712 2.2 00:00:04 100 ./pp143
>>>>>>>> 10379 R 552280 840708 2.2 00:00:04 100 ./pp143
>>>>>>>> 10380 R 552328 841576 2.2 00:00:04 100 ./pp143
>>>>>>>> 10381 R 552788 841216 2.2 00:00:04 99.3 ./pp143
>>>>>>>> Node 2: v252:
>>>>>>>> PID S RSS VSZ %MEM TIME %CPU COMMAND
>>>>>>>> 10152 S 1908 40780 0.0 00:00:00 0.0 orted -mca ess env -mca orte_ess_jobid 1444347904 -mca orte_ess_vpid 2 -mca orte_ess_num_procs 3 --hnp-uri "1444347904.0;tcp://10.1.3.58:39610"
>>>>>>>> 10156 R 552384 840200 2.2 00:00:07 99.3 ./pp143
>>>>>>>> 10157 R 551868 839692 2.2 00:00:06 99.3 ./pp143
>>>>>>>> 10158 R 551400 839184 2.2 00:00:07 100 ./pp143
>>>>>>>> 10159 R 551436 839184 2.2 00:00:06 98.3 ./pp143
>>>>>>>> 10160 R 551760 839692 2.2 00:00:07 100 ./pp143
>>>>>>>> 10161 R 551788 839824 2.2 00:00:07 97.3 ./pp143
>>>>>>>> 10162 R 552256 840332 2.2 00:00:07 100 ./pp143
>>>>>>>> 10163 R 552216 840340 2.2 00:00:07 99.3 ./pp143
>>>>>>>>
>>>>>>>>
>>>>>>>> You would have to do something smarter to get correct process binding etc.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Laurence Marks
>>>>>> Department of Materials Science and Engineering
>>>>>> MSE Rm 2036 Cook Hall
>>>>>> 2220 N Campus Drive
>>>>>> Northwestern University
>>>>>> Evanston, IL 60208, USA
>>>>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>>>>> email: L-marks at northwestern dot edu
>>>>>> Web: www.numis.northwestern.edu
>>>>>> Chair, Commission on Electron Crystallography of IUCR
>>>>>> www.numis.northwestern.edu/
>>>>>> Research is to see what everybody else has seen, and to think what
>>>>>> nobody else has thought
>>>>>> Albert Szent-Gyorgi
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Laurence Marks
>>>> Department of Materials Science and Engineering
>>>> MSE Rm 2036 Cook Hall
>>>> 2220 N Campus Drive
>>>> Northwestern University
>>>> Evanston, IL 60208, USA
>>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>>> email: L-marks at northwestern dot edu
>>>> Web: www.numis.northwestern.edu
>>>> Chair, Commission on Electron Crystallography of IUCR
>>>> www.numis.northwestern.edu/
>>>> Research is to see what everybody else has seen, and to think what
>>>> nobody else has thought
>>>> Albert Szent-Gyorgi
>>>>
>>>
>>>
>>>
>>> --
>>> Laurence Marks
>>> Department of Materials Science and Engineering
>>> MSE Rm 2036 Cook Hall
>>> 2220 N Campus Drive
>>> Northwestern University
>>> Evanston, IL 60208, USA
>>> Tel: (847) 491-3996 Fax: (847) 491-7820
>>> email: L-marks at northwestern dot edu
>>> Web: www.numis.northwestern.edu
>>> Chair, Commission on Electron Crystallography of IUCR
>>> www.numis.northwestern.edu/
>>> Research is to see what everybody else has seen, and to think what
>>> nobody else has thought
>>> Albert Szent-Gyorgi
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Laurence Marks
> Department of Materials Science and Engineering
> MSE Rm 2036 Cook Hall
> 2220 N Campus Drive
> Northwestern University
> Evanston, IL 60208, USA
> Tel: (847) 491-3996 Fax: (847) 491-7820
> email: L-marks at northwestern dot edu
> Web: www.numis.northwestern.edu
> Chair, Commission on Electron Crystallography of IUCR
> www.numis.northwestern.edu/
> Research is to see what everybody else has seen, and to think what
> nobody else has thought
> Albert Szent-Gyorgi
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users