I'm trying to create a tight integration between torque and openmpi for cases
where the tm ras and plm isn't compiled into openmpi. This scenario is
common for linux distros that ship openmpi. Of course the ideal solution is
to recompile openmpi with torque support, but this isn't always feasible since
I do not want to support my own version of openmpi on the stuff I'm
distributing to others.
We also see some proprietary applications shipping their own embedded openmpi
libraries where either tm plm/ras is missing or non-functional with the torque
installation on our system.
So, I've come so far as to create a pbsdshwrapper.py that mimics ssh behaviour
very closely so that starting the orteds on all the hosts works as expected
and the application starts correctly when I use
setenv OMPI_MCA_plm_rsh_agent "pbsdshwrapper.py"
mpirun --hostfile $PBS_NODEFILE ........
What I want now is a way to get rid of the --hostfile $PBS_NODEFILE in the
mpirun command. Is there an environment variable that I can set so that
mpirun grabs the right nodelist?
By spelunking the code I find that the rsh plm has support for SGE where it
automatically picks up the PE_NODEFILE if it detects that it is launched
within an SGE job. Would it be possible to have the same functionality for
torque? The code looks a bit too complex at first sight for me to fix this
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: roy.dragseth_at_[hidden]