Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Windows: MPI_Allreduce() crashes when using MPI_DOUBLE_PRECISION
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-05-09 09:39:15


Please send all the information listed here:

    http://www.open-mpi.org/community/help/

I am able to run your test program with no problem, so I'm not quite sure what the issue is...?

If op->o_func.intrinsic.fns[27] initially points to a valid value and then later it points to 0, that could imply that there is memory corruption occurring in your application somewhere. Have you tried running through a memory-checking debugger?

On May 6, 2011, at 9:56 AM, hi wrote:

> I am observing crash in MPI_Allreduce() call from my actual application.
> After debugging I found that MPI_Allreduce() with MPI_DOUBLE_PRECISION
> returns NULL for following code in op.h
>
> if (0 != (op->o_flags & OMPI_OP_FLAGS_INTRINSIC)) {
> op->o_func.intrinsic.fns[ompi_op_ddt_map[dtype->id]](source, target,
> &count, &dtype,
>
> op->o_func.intrinsic.modules[ompi_op_ddt_map[dtype->id]]);
>
> where, o_func.intrinsic.fns[27] points to 0.

> On further debugging, I found that it is making call to
> mca_coll_basic_reduce_lin_intra(); see below trace...
>
>> libmpid.dll!ompi_op_reduce(ompi_op_t * op, void * source, void * target, int count, ompi_datatype_t * dtype) Line 500 C++
> libmpid.dll!mca_coll_basic_reduce_lin_intra(void * sbuf, void *
> rbuf, int count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 249 C++
> libmpid.dll!mca_coll_sync_reduce(void * sbuf, void * rbuf, int
> count, ompi_datatype_t * dtype, ompi_op_t * op, int root,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 45 + 0xd4 bytes C++
> libmpid.dll!mca_coll_basic_allreduce_intra(void * sbuf, void * rbuf,
> int count, ompi_datatype_t * dtype, ompi_op_t * op,
> ompi_communicator_t * comm, mca_coll_base_module_2_0_0_t * module)
> Line 57 + 0x58 bytes C++
> libmpid.dll!MPI_Allreduce(void * sendbuf, void * recvbuf, int count,
> ompi_datatype_t * datatype, ompi_op_t * op, ompi_communicator_t *
> comm) Line 107 + 0x5c bytes C++
> libmpi_f77d.dll!mpi_allreduce_f(char * sendbuf, char * recvbuf, int
> * count, int * datatype, int * op, int * comm, int * ierr) Line 79 +
> 0x34 bytes C++
> libmpi_f77d.dll!MPI_ALLREDUCE(char * sendbuf, char * recvbuf, int *
> count, int * datatype, int * op, int * comm, int * ierr) Line 53 +
> 0x67 bytes C++
>
>
> Now to simulate this problem, the attached test program works fine but
> I observed completely different callstack see attached images...
>
> Just for information: I am executing my application using following command:
> c:/openmpi/bin/orterun -mca mca_component_show_load_errors 0 --prefix
> ... -x ... -x ... --machinefile ... -np 2 myApplication
>
> And test program using following command:
> c:/openmpi/bin/mpirun mar_f_dp.exe
>
>
> Please let me know based on what criteria "coll_reduce" is pointing to
> "mca_coll_basic_allreduce_intra() or mca_coll_self_allreduce_intra();
> this would help me to debug my application further.
>
> Thank you in advance.
> -Hiral
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/