Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jelena Pjesivac-Grbovic (pjesa_at_[hidden])
Date: 2007-02-14 11:57:05


Hello Lydia,

how does the call to MPI_Reduce look like in your application? Is the
code available?

Thank you,
Jelena

On Wed, 14 Feb 2007, Lydia Heck wrote:

>
> When running either over myrinet or over gigabit one of our codes (Gagdet2)
> it fails predictably with the following error message.
>> From the back trace it looks as if the SEGV is in
> ompi_coll_tuned_reduce_generic.
>
> Have there been similar reportings and/or is there a fix for this?
>
> Lydia Heck
>
>
> [m2042:08002] *** Process received signal ***
> [m2042:08002] Signal: Segmentation Fault (11)
> [m2042:08002] Signal code: Address not mapped (1)
> [m2042:08002] Failing at address: 92
> /opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:opal_backtrace_print+0x26
> /opt/OMPI/ompi-1.2b4r13488/lib/libopen-pal.so.0.0.0:0xc3874
> /lib/amd64/libc.so.1:0xcb686
> /lib/amd64/libc.so.1:0xc0a52
> /opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_generic+0x11b
> [ Signal 11 (SEGV)]
> /opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_binary+0x162
> /opt/OMPI/ompi-1.2b4r13488/lib/openmpi/mca_coll_tuned.so:ompi_coll_tuned_reduce_intra_dec_fixed+0x28d
> /opt/OMPI/ompi-1.2b4r13488/lib/libmpi.so.0.0.0:PMPI_Reduce+0x3f6
> /data/4/nil/tak_gadget/gadget2/P-Gadget2:gravity_tree+0x146c
> /data/4/nil/tak_gadget/gadget2/P-Gadget2:compute_accelerations+0x7e
> /data/4/nil/tak_gadget/gadget2/P-Gadget2:run+0xa5
> /data/4/nil/tak_gadget/gadget2/P-Gadget2:main+0x22f
> /data/4/nil/tak_gadget/gadget2/P-Gadget2:0x7c3c
> [m2042:08002] *** End of error message ***
> [m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
> at line 275
> [m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
> line 793
> [m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c at line 90
> mpirun noticed that job rank 2 with PID 0 on node m2043 exited on signal 11
> (Segmentation Fault).
> [m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/pls_base_orted_cmds.c
> at line 188
> [m2043:07816] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_gridengine_module.c at
> line 828
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons for this job. Returned value
> Timeout instead of ORTE_SUCCESS.
>
>
>
>
> ------------------------------------------
> Dr E L Heck
>
> University of Durham
> Institute for Computational Cosmology
> Ogden Centre
> Department of Physics
> South Road
>
> DURHAM, DH1 3LE
> United Kingdom
>
> e-mail: lydia.heck_at_[hidden]
>
> Tel.: + 44 191 - 334 3628
> Fax.: + 44 191 - 334 3645
> ___________________________________________
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

--
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722 
(865) 974 - 6321
jpjesiva_at_[hidden]
Murphy's Law of Research:
         Enough research will tend to support your theory.