Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with openmpi and infiniband
From: Biagio Lucini (B.Lucini_at_[hidden])
Date: 2009-01-02 09:21:29


Pavel Shamis (Pasha) wrote:
>
>>> Another thing to try is a change that we made late in the Open MPI
>>> v1.2 series with regards to IB:
>>>
>>>
>>> http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion
>>>
>>>
>> Thanks, this is something worth investigating. What would be the
>> exact syntax to use to turn off pml_ob1_use_early_completion?
> Your problem definitely maybe related to the know issue with early
> completions. The exact syntax is:|
> --mca pml_ob1_use_early_completion 0|
>
Unfortunately this did not help: still the same problem. Here is the
script I run: last line for the tcp test, previous line for the openib
test.
------------------------------------------------------------------------------------------------------------------------------
#!/bin/bash
#$ -S /bin/bash

#Set out, error and job name
#$ -o run2.out
#$ -e run2.err
#$ -N su3_01Jan

#Number of nodes for mpi (18 in this case)
#$ -pe make 38

# The batchsystem should use the current directory as working directory.
#$ -cwd

export
LD_LIBRARY_PATH=/opt/numactl-0.6.4/:/opt/sge-6.0u8/lib/lx24-amd64:/opt/ompi128-intel/lib

echo LD_LIBRARY_PATH $LD_LIBRARY_PATH
ldd ./k-string

ulimit -l 8388608
ulimit -a

export PATH=$PATH:/opt/ompi128-intel/bin
which mpirun

#The actual mpirun command
#mpirun -np $NSLOTS -mca btl openib,sm,self --mca
pml_ob1_use_early_completion 0 ./k-string
mpirun -np $NSLOTS -mca btl tcp,sm,self ./k-string

-------------------------------------------------------------------------------------------------------------------------------------------

This also contains extra diagnostic for the path, library path, memory
locked etc. All seems ok, and as before the tcp run goes well, the
openib run has communication problem (it looks like no communication
channel can be open or recognised). I will try OMPI1.3 rc2 (as it has
been suggested), failing that I will try to isolate a test case, to see
if the problem can be reproduced on other systems. Meanwhile, I'm happy
to listen to any suggestion you might have.

Thanks,
Biagio