Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [openib] segfault when using openib btl
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-07-12 04:53:58


I'm focusing on the MPI_Bcast routine that seems to randomly segfault when using the openib btl.
I'd like to know if there is any way to make OpenMPI switch to a different algorithm than the default one being selected for MPI_Bcast.

Thanks for your help,

On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
> Hi,
> I'm observing a random segmentation fault during an internode parallel
> computation involving the openib btl and OpenMPI-1.4.2 (the same issue
> can be observed with OpenMPI-1.3.3).
> mpirun (Open MPI) 1.4.2
> Report bugs to
> [pbn08:02624] *** Process received signal ***
> [pbn08:02624] Signal: Segmentation fault (11)
> [pbn08:02624] Signal code: Address not mapped (1)
> [pbn08:02624] Failing at address: (nil)
> [pbn08:02624] [ 0] /lib64/ [0x349540e4c0]
> [pbn08:02624] *** End of error message ***
> sh: line 1: 2624 Segmentation fault
> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x86_64\
> /bin\/actranpy_mp
> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_64/Ac
> tran_11.0.rc2.41872'
> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.dat'
> '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200'
> '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain'
> If I choose not to use the openib btl (by using --mca btl self,sm,tcp on
> the command line, for instance), I don't encounter any problem and the
> parallel computation runs flawlessly.
> I would like to get some help to be able:
> - to diagnose the issue I'm facing with the openib btl
> - understand why this issue is observed only when using the openib btl
> and not when using self,sm,tcp
> Any help would be very much appreciated.
> The outputs of ompi_info and the configure scripts of OpenMPI are
> enclosed to this email, and some information on the infiniband drivers
> as well.
> Here is the command line used when launching a parallel computation
> using infiniband:
> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
> btl openib,sm,self,tcp --display-map --verbose --version --mca
> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
> and the command line used if not using infiniband:
> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
> btl self,sm,tcp --display-map --verbose --version --mca
> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
> Thanks,
> Eloi