Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] [openib] segfault when using openib btl
From: Eloi Gaudry (eg_at_[hidden])
Date: 2010-07-12 04:53:58


Hi,

I'm focusing on the MPI_Bcast routine that seems to randomly segfault when using the openib btl.
I'd like to know if there is any way to make OpenMPI switch to a different algorithm than the default one being selected for MPI_Bcast.

Thanks for your help,
Eloi

On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
> Hi,
>
> I'm observing a random segmentation fault during an internode parallel
> computation involving the openib btl and OpenMPI-1.4.2 (the same issue
> can be observed with OpenMPI-1.3.3).
> mpirun (Open MPI) 1.4.2
> Report bugs to http://www.open-mpi.org/community/help/
> [pbn08:02624] *** Process received signal ***
> [pbn08:02624] Signal: Segmentation fault (11)
> [pbn08:02624] Signal code: Address not mapped (1)
> [pbn08:02624] Failing at address: (nil)
> [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0]
> [pbn08:02624] *** End of error message ***
> sh: line 1: 2624 Segmentation fault
> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x86_64\
> /bin\/actranpy_mp
> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_64/Ac
> tran_11.0.rc2.41872'
> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.dat'
> '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200'
> '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain'
>
> If I choose not to use the openib btl (by using --mca btl self,sm,tcp on
> the command line, for instance), I don't encounter any problem and the
> parallel computation runs flawlessly.
>
> I would like to get some help to be able:
> - to diagnose the issue I'm facing with the openib btl
> - understand why this issue is observed only when using the openib btl
> and not when using self,sm,tcp
>
> Any help would be very much appreciated.
>
> The outputs of ompi_info and the configure scripts of OpenMPI are
> enclosed to this email, and some information on the infiniband drivers
> as well.
>
> Here is the command line used when launching a parallel computation
> using infiniband:
> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
> btl openib,sm,self,tcp --display-map --verbose --version --mca
> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
> and the command line used if not using infiniband:
> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
> btl self,sm,tcp --display-map --verbose --version --mca
> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
>
> Thanks,
> Eloi