Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2007-04-11 12:33:25


Michael,

The MPI standard is quite clear. In order to have a correct and
portable MPI code, you are not allowed to use (void*)0. Use
MPI_BOTTOM instead.

We have plenty of tests which test the exact behavior you describe in
your email. And they all pass. I will take a look at what's happens
but I need either the code or at least the part which create the
datatype.

   Thanks,
     george.

On Apr 11, 2007, at 3:54 AM, Michael Gauckler wrote:

> Dear Open MPI User's and Developers,
>
> I encountered a problem with Open MPI when porting an application,
> which successfully ran with LAM MPI and MPICH.
>
> The program produces a segmentation fault (see [1] for the stack
> trace) when doing the MPI_Send with the following arguments:
>
> MPI_Send((void *)0, 1, datatype, rank, tag, comm_);
>
> The first argument seems to be wrong at first sight, but is correct
> because the argument "datatype" is an MPI_Datatype,
> which describes the memory layout of the object to be sent and is
> zero-based. The other arguments are as expected: one such object is
> sent to rank "rank" with tag "tag" with the help of the
> communicator "comm_". The MPI_Datatype is constructed
> programmatically from the objects member definitions using
> MPI_Type_struct. The MPI types involved are solely
> MPI_DOUBLE and MPI_UNSIGNED_INT.
>
> I can reproduce the problem with the stable 1.2 release as well as
> the 1.2.1a snapshot of Open MPI.
> My OS is Linux with Kernel 2.6.18 (Debian Etch) running on standard
> Dual Xeon Hardware with GigE.
>
> I tried to reduce the amount of data sent by excluding some of the
> object's members from the transmission. There does not seem to be a
> certain member or type which causes the problem. There seems to be
> a limit of members/data/size which determines the success of the
> call. The "datatype" structure describes the type and location of
> approx. 2'000'000 numbers. The data itself is approx. 16MB (2M * 8
> bytes/number assuming doubles), which I expect not to cause any
> problem to a MPI implementation.
>
> Thank you for hints, ideas or suggestions where the problem could be.
>
> Regards,
> Michael
>
> [1]
>
> [head:09133] *** Process received signal ***
> [head:09133] Signal: Segmentation fault (11)
> [head:09133] Signal code: Address not mapped (1)
> [head:09133] Failing at address: 0xb0127475
> [head:09133] [ 0] [0xb7f0f440]
> [head:09133] [ 1] /usr/lib/libmpi.so.0(ompi_convertor_pack+0x90)
> [0xb668f9a0]
> [head:09133] [ 2] /usr/lib/openmpi/mca_btl_tcp.so
> (mca_btl_tcp_prepare_src+0x210) [0xb56daef0]
> [head:09133] [ 3] /usr/lib/openmpi/mca_pml_ob1.so
> (mca_pml_ob1_send_request_schedule_exclusive+0x1de) [0xb5726ede]
> [head:09133] [ 4] /usr/lib/openmpi/mca_pml_ob1.so [0xb5728238]
> [head:09133] [ 5] /usr/lib/openmpi/mca_btl_tcp.so [0xb56ddc65]
> [head:09133] [ 6] /usr/lib/libopen-pal.so.0(opal_event_base_loop
> +0x462) [0xb65bcf12]
> [head:09133] [ 7] /usr/lib/libopen-pal.so.0(opal_event_loop+0x29)
> [0xb65bcfd9]
> [head:09133] [ 8] /usr/lib/libopen-pal.so.0(opal_progress+0xc0)
> [0xb65b7260]
> [head:09133] [ 9] /usr/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send
> +0x3e5) [0xb571f965]
> [head:09133] [10] /usr/lib/libmpi.so.0(MPI_Send+0x12f) [0xb66abf0f]
> [head:09133] [11] /opt/plato/release_1.0/bin/engine
> (_ZN2GP15MPIProcessGroup4sendERKNS_9MemoryMapEii+0xd9) [0x81cec03]
> [head:09133] [12] /opt/plato/release_1.0/bin/engine
> (_ZN2GP15MPIProcessGroup4sendEN5boost10shared_ptrINS_6EntityEEEii
> +0x2d0) [0x81d0358]
> [head:09133] [13] /opt/plato/release_1.0/bin/engine
> (_ZN2GP20ParallelDataAccessor4loadEN5boost10shared_ptrINS_6EntityEEE
> +0x23b) [0x853c939]
> [head:09133] [14] /opt/plato/release_1.0/bin/engine
> (_ZN2GP12Transactions6createEPKN11xercesc_2_77DOMNodeE+0x57f)
> [0x8426553]
> [head:09133] [15] /opt/plato/release_1.0/bin/engine
> (_ZN2GP7FactoryIN5boost10shared_ptrINS_7XmlBaseEEESsPFS4_PKN11xercesc_
> 2_77DOMNodeEENS_19DefaultFactoryErrorEE12createObjectES8_+0x76)
> [0x81ca06a]
> [head:09133] [16] /opt/plato/release_1.0/bin/engine
> (_ZN2GP16XmlFactoryParser7descentEPN11xercesc_2_77DOMNodeEb+0x5b2)
> [0x81cd700]
> [head:09133] [17] /opt/plato/release_1.0/bin/engine
> (_ZN2GP9XmlParser8traverseEb+0x278) [0x81c1eca]
> [head:09133] [18] /opt/plato/release_1.0/bin/engine
> (_ZN2GP16XmlFactoryParser8traverseEb+0x19) [0x81c9eeb]
> [head:09133] [19] /opt/plato/release_1.0/bin/engine(main+0x1d23)
> [0x81617f7]
> [head:09133] [20] /lib/tls/i686/cmov/libc.so.6(__libc_start_main
> +0xc8) [0xb6348ea8]
> [head:09133] [21] /opt/plato/release_1.0/bin/engine
> (__gxx_personality_v0+0x15d) [0x815a731]
> [head:09133] *** End of error message ***
> mpirun noticed that job rank 0 with PID 9133 on node head exited on
> signal 11 (Segmentation fault).
> 2 additional processes aborted (not shown)
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

"Half of what I say is meaningless; but I say it so that the other
half may reach you"
                                   Kahlil Gibran