Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Katherine Holcomb (kholcomb_at_[hidden])
Date: 2006-10-25 13:39:46

> I presume you left some critical piece of information out of your
> message, like the name and configuration of the batch queueing system
> you are using.

We're using PBS Pro although I don't think it's a factor in this
particular situation. (I did find some behavior with PBS Pro that
seemed not to be as advertised, i.e. it was placing all the processes on
one node when two were requested unless the -machinefile flag was
explicitly set to $PBS_NODEFILE, but that was a different problem.)

> The answer to your question as worded may not be the best answer for
> your problem.
> I have dealt with two cases similar to yours:
> 1) Large system using Modules and LSF batch queueing system -- this
> type of system requires the people configuring LSF to set up some
> stuff or the end users have to use --prefix flag to get the OpenMPI
> path, plus more to get the correct compiler (something I never
> figured out how to do before the LSF admins extended their LSF
> installation to cover OpenMPI). [what stuff I don't know, I'm not an
> LSF admin]

It does look like we'll have to use the --prefix flag, at least to
start. Rainer Keller pointed out that I can set an environment variable
in the module script and that does seem to be the best option for now.
We'd rather not get into wrapping the binaries.

> 2) Local system I sysadm, learning Modules setup was going to take
> more time than I had available so I wrote a script that sets PATH,
> MANPATH, and LD_LIBRARY_PATH based on similar arguments as the real
> Module software (also G95_INCLUDE_PATH for g95). When the user sets
> the environmental variables via my script and then runs OpenMPI I see
> no problems with OpenMPI on the other nodes; however, we don't have a
> batch queuing system. I don't see why using the Modules software
> would be any different. One critical piece is that my script also
> aliases mpirun, for example "alias mpirun "mpirun --prefix /opt/g95/
> openmpi/1.1.1 " (which the real modules software should also be able
> to do if needed) and I have only one installation of each type of
> compiler (g95, Intel, PGI, Absoft).

Long term we are probably going to do something similar (write our own
Modules replacement). For one thing, the Modules software doesn't seem
to have been maintained for a while, and for another, it uses Tcl, which
is not much of a mainstream language anymore.

Katherine Holcomb, Ph.D.                kholcomb_at_[hidden]
Research Computing Support Group - ITC  Office Phone: (434) 982-5948
148 BSEL, Clark Hall                    Center Phone: (434) 243-8799
University of Virginia 22904