Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] assert in opal_datatype_is_contiguous_memory_layout
From: George Bosilca (bosilca_at_[hidden])
Date: 2013-04-08 09:50:05


Eric,

Thanks for the report. I used your example to replicate the issue and I confirm it appears in all versions in debug mode. However, the assert in the convertor code is correct and your code as well. The issue is more complex, and it is triggered by a usage of the convertor which should have been prevented.

If I'm not mistaken, Edgar (CC'ed on this email) is the maintainer of that particular code path. Hopefully, he will be able to fix the code based on the following analysis.

The underlying issue is that when the convertor is created with no data to convert, it is automatically marked as COMPLETED. Once in this state, no further conversion calls should be made, or they will trigger the issue you encountered. Unfortunately, the code in the OMPIIO doesn't check if there is more data to handle before going into the opal_convertor_raw function (function which as I said above is not supposed to be called on a completed convertor). The function ompi_io_ompio_decode_datatype, assume that there is at least one segment in the file, fact that explain the call to opal_convertor_raw.

I modified the ompi_convertor_raw to accept he case where the convertor is already completed and return the same value as opal_convertor_pack/unpack (r28305), so now we have a consistent interface for the convertor. However, this lead to a division with zero in the OMPIIO layer as the number of iovecs returned by opal_convertor_raw is now zero, and this is not handled. I hope Edgar will be able to fix that part.

  George.

On Apr 5, 2013, at 23:10 , Eric Chamberland <Eric.Chamberland_at_[hidden]> wrote:

> Hi all,
>
> (Sorry, I have sent this to "users" but I should have sent it to "devel" list instead. Sorry for the mess...)
>
> I have attached a very small example which raise an assertion.
>
> The problem is arising from a process which does not have any element to write in a file (and then in the MPI_File_set_view)...
>
> You can see this "bug" with openmpi 1.6.3, 1.6.4 and 1.7.0 configured with:
>
> ./configure --enable-mem-debug --enable-mem-profile --enable-memchecker
> --with-mpi-param-check --enable-debug
>
> Just compile the given example (idx_null.cc) as-is with
>
> mpicxx -o idx_null idx_null.cc
>
> and run with 3 processes:
>
> mpirun -n 3 idx_null
>
> You can modify the example by commenting "#define WITH_ZERO_ELEMNT_BUG" to see that everything is going well when all processes have something to write.
>
> There is no "bug" if you use openmpi 1.6.3 (and higher) without the debugging options.
>
> Also, all is working well with mpich-3.0.3 configured with:
>
> ./configure --enable-g=yes
>
>
> So, is this a wrong "assert" in openmpi?
>
> Is there a real problem to use this example in a "release" mode?
>
> Thanks,
>
> Eric
> <idx_null.cc>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel