Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] assert in opal_datatype_is_contiguous_memory_layout
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-04-25 17:20:34


To follow up for the web archives...

We fixed this bug off-list. It will be included in 1.6.5 and (likely) 1.7.2.

On Apr 5, 2013, at 3:18 PM, Eric Chamberland <Eric.Chamberland_at_[hidden]> wrote:

> Hi again,
>
> I have attached a very small example which raise the assertion.
>
> The problem is arising from a process which does not have any element to write in the file (and then in the MPI_File_set_view)...
>
> You can see this "bug" with openmpi 1.6.3, 1.6.4 and 1.7.0 configured with:
>
> ./configure --enable-mem-debug --enable-mem-profile --enable-memchecker
> --with-mpi-param-check --enable-debug
>
> Just compile the given example (idx_null.cc) as-is with
>
> mpicxx -o idx_null idx_null.cc
>
> and run with 3 processes:
>
> mpirun -n 3 idx_null
>
> You can modify the example by commenting "#define WITH_ZERO_ELEMNT_BUG" to see that everything is going well when all processes have something to write.
>
> There is no "bug" if you use openmpi 1.6.3 (and higher) without the debugging options.
>
> Also, all is working well with mpich-3.0.3 configured with:
>
> ./configure --enable-g=yes
>
>
> So, is this a wrong "assert" in openmpi?
>
> Is there a real problem to use this code in a "release" mode?
>
> Thanks,
>
> Eric
>
> On 04/05/2013 12:57 PM, Eric Chamberland wrote:
>> Hi all,
>>
>> I have a well working (large) code which is using openmpi 1.6.3 (see
>> config.log here:
>> http://www.giref.ulaval.ca/~ericc/bug_openmpi/config.log_nodebug)
>>
>> (I have used it for reading with MPI I/O with success over 1500 procs
>> with very large files)
>>
>> However, when I use openmpi compiled with "debug" options:
>>
>> ./configure --enable-mem-debug --enable-mem-profile --enable-memchecker
>> --with-mpi-param-check --enable-debug --prefix=/opt/openmpi-1.6.3_debug
>> (se other config.log here:
>> http://www.giref.ulaval.ca/~ericc/bug_openmpi/config.log_debug) the code
>> is aborting with an assertion on a very small example on 2 processors.
>> (the same very small example is working well without the debug mode)
>>
>> Here is the assertion causing an abort:
>>
>> ===================================
>>
>> openmpi-1.6.3/opal/datatype/opal_datatype.h:
>>
>> static inline int32_t
>> opal_datatype_is_contiguous_memory_layout( const opal_datatype_t*
>> datatype, int32_t count )
>> {
>> if( !(datatype->flags & OPAL_DATATYPE_FLAG_CONTIGUOUS) ) return 0;
>> if( (count == 1) || (datatype->flags & OPAL_DATATYPE_FLAG_NO_GAPS)
>> ) return 1;
>>
>>
>> /* This is the assertion: */
>>
>> assert( (OPAL_PTRDIFF_TYPE)datatype->size != (datatype->ub -
>> datatype->lb) );
>>
>> return 0;
>> }
>>
>> ===================================
>>
>> Does anyone can tell me what does this mean?
>>
>> It happens while writing a file with MPI I/O when I am calling for the
>> fourth time a "MPI_File_set_view"... with different types of
>> MPI_Datatype created with "MPI_Type_indexed".
>>
>> I am trying to reproduce the bug with a very small example to be send
>> here, but if anyone has a hint to give me...
>> (I would like: this assert is not good! just ignore it ;-) )
>>
>> Thanks,
>>
>> Eric
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> <idx_null.cc>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/