Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Dynamic algorithms problem
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-07-07 10:03:22


I do believe that this is a bug. I *think* that the included patch will fix it for you, but George is on vacation until tomorrow (and I don't know how long it'll take him to slog through his backlog :-( ).

Can you try the following patch and see if it fixes it for you?

Index: ompi/mca/coll/tuned/coll_tuned_module.c
===================================================================
--- ompi/mca/coll/tuned/coll_tuned_module.c (revision 23360)
+++ ompi/mca/coll/tuned/coll_tuned_module.c (working copy)
@@ -165,6 +165,7 @@
     { \
         int need_dynamic_decision = 0; \
         ompi_coll_tuned_forced_getvalues( (TYPE), &((DATA)->user_forced[(TYPE)]) ); \
+ (DATA)->com_rules[(TYPE)] = NULL; \
         if( 0 != (DATA)->user_forced[(TYPE)].algorithm ) { \
             need_dynamic_decision = 1; \
             EXECUTE; \

On Jul 4, 2010, at 8:12 AM, Gabriele Fatigati wrote:

> Dear OpenMPI user,
>
> i'm trying to use collective dynamic rules with OpenMPi 1.4.2:
>
> export OMPI_MCA_coll_tuned_use_dynamic_rules=1
> export OMPI_MCA_coll_tuned_bcast_algorithm=1
>
> My target is to test Bcast peformances using SKaMPI benchmark changing dynamic rules. But at runtime i get the follow error:
>
>
> [node003:05871] *** Process received signal ***
> [node003:05871] Signal: Segmentation fault (11)
> [node003:05871] Signal code: Address not mapped (1)
> [node003:05871] Failing at address: 0xcc
> [node003:05872] *** Process received signal ***
> [node003:05872] Signal: Segmentation fault (11)
> [node003:05872] Signal code: Address not mapped (1)
> [node003:05872] Failing at address: 0xcc
> [node003:05871] [ 0] /lib64/libpthread.so.0 [0x3be160e4c0]
> [node003:05871] [ 1] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2accf7210145]
> [node003:05871] [ 2] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2accf720ef16]
> [node003:05871] [ 3] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2accf721fec9]
> [node003:05871] [ 4] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0(MPI_Bcast+0x171) [0x2accf71b81e1]
> [node003:05871] [ 5] ./skampi [0x409566]
> [node003:05871] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3be0e1d974]
> [node003:05871] [ 7] ./skampi [0x404e19]
> [node003:05871] *** End of error message ***
> [node003:05872] [ 0] /lib64/libpthread.so.0 [0x3be160e4c0]
> [node003:05872] [ 1] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2b1959eb3145]
> [node003:05872] [ 2] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2b1959eb1f16]
> [node003:05872] [ 3] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0 [0x2b1959ec2ec9]
> [node003:05872] [ 4] /gpfs/scratch/userinternal/cin0243a/openmpi-1.4.2/openmpi-1.4.2-install/lib/libmpi.so.0(MPI_Bcast+0x171) [0x2b1959e5b1e1]
> [node003:05872] [ 5] ./skampi [0x409566]
> [node003:05872] [ 6] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3be0e1d974]
> [node003:05872] [ 7] ./skampi [0x404e19]
> [node003:05872] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 9 with PID 5872 on node node003ib0 exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
>
> The same using other Bcast algorithm. Disabling dynamic rules, it works well. Maybe i'm using some wrong parameter setup?
>
> Thanks in advance.
>
>
>
>
>
> --
> Ing. Gabriele Fatigati
>
> Parallel programmer
>
> CINECA Systems & Tecnologies Department
>
> Supercomputing Group
>
> Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
>
> www.cineca.it Tel: +39 051 6171722
>
> g.fatigati [AT] cineca.it
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/