On Jul 16, 2006, at 6:12 AM, Keith Refson wrote:
> The compile of openmpi 1.1 was without problems and
> appears to have correctly built the GM btl.
> $ ompi_info -a | egrep "\bgm\b|_gm_"
> MCA mpool: gm (MCA v1.0, API v1.0, Component v1.1)
> MCA btl: gm (MCA v1.0, API v1.0, Component v1.1)
Ok, so GM support is definitely built into your build of Open MPI,
which is a good start.
> However I have been unable to sey up a parallel run which uses gm.
> If I start a run using the openmpi mpirun command, the program
> correctly in parallel. However the timings appear to suggest that
> it is
> using tcp, and the command executed on the node looks like:
> orted --bootproxy 1 --name 0.0.1 --num_procs 3 --vpid_start 0 --
> scarf-cn001.rl.ac.uk --universe
> cse0000_at_[hidden]:default-universe-28588 --nsreplica
> "0.0.0;tcp://192.168.1.1:52491;tcp://126.96.36.199:52491" --gprreplica
Right, orted is just a starter for the MPI processes -- the
information on interconnects to use and that kind of stuff is passed
through the out-of-band communication mechanism. orted doesn't
really care which interconnect the MPI process is going to use, so we
don't pass it on the command line.
> Furthermore if attempt to start with the mpirun arguments "--mca btl
> gm,self,^tcp" the run aborts at the MPI_INIT call.
> Q1: Is there anything else I have to do to get openmpi to use gm?
The command line you want is:
mpirun -np X -mca btl gm,sm,self <other arguments>
If this causes an error during MPI_INIT or early in your application,
it would be useful to see all the output form the parallel run. That
likely indicates that there is something wrong with the
initialization of the interconnect.
> Q2: Is there any way of diagnosing which btl is actually being used
> and why? None "-v" option to mpirun, "-mca btl
> or "-mca btl btl_gm_debug=1" make any difference or produce any
> more output
The arguments you want would look like:
mpirun -np X -mca btl gm,sm,self -mca btl_base_verbose 1 -mca
btl_gm_debug 1 <other arguments>
> Q3: Is there a way to make openmpi work with the LSF commands? So
> I have constructed a hostfile from the LSF environment variable
> LSB_HOSTS and used the openmpi mpirun command to start the
> parallel executable.
Currently, we do not have tight LSF integration for Open MPI, like we
do for PBS, SLURM, and BProc. This is mainly because the only LSF
machines the development team regularly uses are BProc machines,
which do not use the traditional startup and allocation mechanisms of
LSF. I believe it is on our feature request list, but I also don't
believe we have a timeline for implementation.
Open MPI developer