On Jan 29, 2006, at 6:09 PM, Brian Granger wrote:
> I have compiled and installed OpenMPI on Mac OS X. As I
> undertstand it, I can have mpirun start jobs using either ssh/xgrid
> or any other system (PBS, etc.) that I have installed. How can I
> configure which method is used? What process does ompi/orte go
> through to select which method to use?
> Currently I am mainly interested in ssh/xgrid at this point, but
> PBS soon. How do these work? From poking around it looks like
> there are lots of MCA parameters for the ras/pls modules that are
> relevant. But there is very little documentation about what they
> all do.
> Can anyone give me pointers about where to look for more
Unfortunately (shame on me) there isn't any documentation on the
XGrid support at this time. It's on my to-do list, but so are a lot
of other things. I've included some notes below that should help --
if not, feel free to ask all the questions you want. It will help to
know what information people expect.
Open MPI does a run-time priority ranking to determine which process
starter is used. ssh/rsh has the lowest ranking, and XGrid, PBS, and
SLURM all have a rating that is higher than ssh/rsh. However, the
XGrid, PBS, and SLURM components all only allow themselves to be
selected if some other condition is met that indicates that they
should be used. For PBS and SLURM, this is the environment variables
set by the batch scheduler indicating that a PBS (or SLURM) job is
The XGrid starter currently looks for a couple of environment
variables to decide if it can be used. Currently, the XGrid process
starter only supports the basic password authentication to the
controller. As such, the two environment variables the XGrid starter
looks for are XGRID_CONTROLLER_HOSTNAME and
XGRID_CONTROLLER_PASSWORD. These are the same environment variables
that the 'xgrid' command-line submission process uses.
The XGrid support in Open MPI is currently in a beta stage, and has a
couple of limitations that might make it unappealing to you. It
requires that Open MPI be installed on all the nodes, and be in the
default path for user 'nobody', which pretty much means installing it
in /usr. This is because it only supports password authentication
(and not Kerboeros authentication), so all jobs will run as nobody.
If there is interest, it would not be hard to add Kerberos
authentication support. The XGridFoundation framework is only
available for 32 bit PPC / x86, so the starter will only build if
Open MPI is building in 32 bit mode. We currently require all Open
MPI processes (run-time and application) be the same endianness and
pointer size, so all user processes must be 32 bit applications. We
intend on removing this restriction some time in the future, allowing
a 32 bit runtime and 64 bit user application.
The restriction that Open MPI be installed on all nodes is a slightly
more difficult problem. Open MPI usually builds as a shared library
with a bunch of dynamically loaded shared objects, complicating the
list of what must be migrated. Even if statically linked, there is
still a helper process we have to migrate out with your application
(to deal with standard I/O in the expected way, along with some other
features that are much easier to implement with a helper daemon).
To use the XGrid system, make sure that the XGrid controller is
properly configured to use password-based authentication. Then
issues the following commands (assuming tcsh)
% setenv XGRID_CONTROLLER_HOSTNAME mycomputer.apple.com
% setenv XGRID_CONTROLLER_PASSWORD pword
% mpirun -np X ./myapp
XGrid does not give users a way to know how many nodes are
available. Open MPI assumes that if a user requested X nodes, there
will eventually be X nodes available to run on. SO if X is greater
than the available number of nodes, mpirun will happily submit that
request to XGrid and XGrid will happily queue the job until X number
of nodes are available. I wish there was a better way to handle that
situation, but there doesn't seem to be. I've talked a little bit
with the XGrid developers about improving this. Since XGrid is
intended to be used in environments where machines come and go at
will, it can be difficult to determine how many agents are up and
running -- that isn't a static answer. I think at one point there
was talk of adding a flag to the job submission that would bounce the
job out of the queue if some period of time (possibly including
immediately) passed without the job being queued. I don't know if
anything ever came of that discussion.
There is really only one MCA parameter that users should ever have to
adjust for the XGrid starter. The MCA parameter
"pls_xgrid_job_delete" defaults to 1 and if it is non-zero, jobs will
be removed from the list of executed jobs that have completed (the
XGrid controller maintains this list). If jobs aren't deleted by
Open MPI at completion, their results will remain in the XGrid
contoller's data store until the user manually deletes them.
As for the rsh/ssh component, there are a couple of MCA parameters
that might be of use to most users.
pls_rsh_num_concurrent: Open MPI tries to fork off this number of
instances before waiting for some to complete to move on. This
defaults to 128. On platforms with low per-user process or file
descriptor counts, this may have to be slightly lower. On
machines, it's possible start-up performance would increase by
increasing this number
pls_rsh_assume_same_shell: Open MPI will assume the same shell is
the remote nodes as on the current node (ie, they are all tcsh,
ksh, etc.) if this is non-zero. Otherwise, we must log in to
twice, the first time to determine which shell is used on the
plsh_rsh_agent: a colon (:) separated list of startup agents to
to use. Open MPI will use the first one available on the starting
node. If a starter is available but doesn't work, an error will
result. The default value is 'ssh : rsh', meaning that ssh
used unless it isn't installed, in which case rsh will be used.
Please let me know if you have more questions.
Open MPI developer