Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [openib] segfault when using openib btl
From: Rolf vandeVaart (rolf.vandevaart_at_[hidden])
Date: 2010-07-13 14:39:59


Hi Eloi:
To select the different bcast algorithms, you need to add an extra mca
parameter that tells the library to use dynamic selection.
--mca coll_tuned_use_dynamic_rules 1

One way to make sure you are typing this in correctly is to use it with
ompi_info. Do the following:
ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll

You should see lots of output with all the different algorithms that can
be selected for the various collectives.
Therefore, you need this:

--mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1

Rolf

On 07/13/10 11:28, Eloi Gaudry wrote:
> Hi,
>
> I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch to the basic linear algorithm.
> Anyway whatever the algorithm used, the segmentation fault remains.
>
> Does anyone could give some advice on ways to diagnose the issue I'm facing ?
>
> Regards,
> Eloi
>
>
> On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
>
>> Hi,
>>
>> I'm focusing on the MPI_Bcast routine that seems to randomly segfault when
>> using the openib btl. I'd like to know if there is any way to make OpenMPI
>> switch to a different algorithm than the default one being selected for
>> MPI_Bcast.
>>
>> Thanks for your help,
>> Eloi
>>
>> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
>>
>>> Hi,
>>>
>>> I'm observing a random segmentation fault during an internode parallel
>>> computation involving the openib btl and OpenMPI-1.4.2 (the same issue
>>> can be observed with OpenMPI-1.3.3).
>>>
>>> mpirun (Open MPI) 1.4.2
>>> Report bugs to http://www.open-mpi.org/community/help/
>>> [pbn08:02624] *** Process received signal ***
>>> [pbn08:02624] Signal: Segmentation fault (11)
>>> [pbn08:02624] Signal code: Address not mapped (1)
>>> [pbn08:02624] Failing at address: (nil)
>>> [pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0]
>>> [pbn08:02624] *** End of error message ***
>>> sh: line 1: 2624 Segmentation fault
>>>
>>> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x86_6
>>> 4\ /bin\/actranpy_mp
>>> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_64/A
>>> c tran_11.0.rc2.41872'
>>> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.dat'
>>> '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch' '--mem=3200'
>>> '--threads=1' '--errorlevel=FATAL' '--t_max=0.1' '--parallel=domain'
>>>
>>> If I choose not to use the openib btl (by using --mca btl self,sm,tcp on
>>> the command line, for instance), I don't encounter any problem and the
>>> parallel computation runs flawlessly.
>>>
>>> I would like to get some help to be able:
>>> - to diagnose the issue I'm facing with the openib btl
>>> - understand why this issue is observed only when using the openib btl
>>> and not when using self,sm,tcp
>>>
>>> Any help would be very much appreciated.
>>>
>>> The outputs of ompi_info and the configure scripts of OpenMPI are
>>> enclosed to this email, and some information on the infiniband drivers
>>> as well.
>>>
>>> Here is the command line used when launching a parallel computation
>>>
>>> using infiniband:
>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
>>>
>>> btl openib,sm,self,tcp --display-map --verbose --version --mca
>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
>>>
>>> and the command line used if not using infiniband:
>>> path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list --mca
>>>
>>> btl self,sm,tcp --display-map --verbose --version --mca
>>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
>>>
>>> Thanks,
>>> Eloi
>>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>