Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-10-24 10:36:29


That's a really weird backtrace - it seems to indicate that the
datatype engine is improperly calling free(). Can you try running
without openib (add "-mca btl_base_exclude openib" to the mpirun
arguments) and see if the problem goes away? Also, what platform was
this on?

Thanks,

Brian

On Oct 21, 2005, at 6:37 PM, Troy Benjegerdes wrote:

> On Fri, Oct 21, 2005 at 06:26:32PM -0500, Troy Benjegerdes wrote:
>
>> On Fri, Oct 21, 2005 at 04:12:05PM -0500, Andrew Friedley wrote:
>>
>>> I just committed a fix to the trunk to fix your original segfault
>>> down
>>> in opal_show_help() - this is the same problem Ken posted. This fix
>>> should make it into the v1.0 branch eventually. Even so, you are
>>> going
>>> to run into the real problem you were handling - this fix is just
>>> for
>>> proper error handling/output.
>>>
>>> The error below looks like a word size mismatch - one thing is
>>> compiled
>>> 64bit, the other is compiled 32bit. Make sure everything is
>>> compiled
>>> either 32bit or 64bit.
>>>
>>
>> Another note.. I think I may have had some problems because I
>> built with
>> 'make -j16'.. has anyone else tried parallel make builds?
>>
>> I have a working mpirun now.
>>
>> Now I'm back to having NetPIPE segfault when I run it..
>>
>
> And here's a backtrace:
>
> 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> #0 0x00002aaaab6fb365 in malloc_usable_size () from /lib/libc.so.6
> #1 0x00002aaaaaecb016 in opal_mem_free_free_hook ()
> from /usr/local/lib/libopal.so.0
> #2 0x00002aaaaac0c663 in ompi_convertor_cleanup ()
> from /usr/local/lib/libmpi.so.0
> #3 0x00002aaaaeb41dbe in mca_pml_ob1_match_completion_cache ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #4 0x00002aaaaf179c7b in mca_btl_sm_component_progress ()
> from /usr/local/lib/openmpi/mca_btl_sm.so
> #5 0x00002aaaaee5eefe in mca_bml_r2_progress ()
> from /usr/local/lib/openmpi/mca_bml_r2.so
> #6 0x00002aaaaeb3dd4e in mca_pml_ob1_progress ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #7 0x00002aaaaaeb5c4a in opal_progress () from
> /usr/local/lib/libopal.so.0
> #8 0x00002aaaaeb3c265 in mca_pml_ob1_recv ()
> from /usr/local/lib/openmpi/mca_pml_ob1.so
> #9 0x00002aaaaf6a0936 in mca_coll_basic_barrier_intra_lin ()
> from /usr/local/lib/openmpi/mca_coll_basic.so
> #10 0x00002aaaaac1f3b8 in PMPI_Barrier () from
> /usr/local/lib/libmpi.so.0
> ---Type <return> to continue, or q <return> to quit---#11
> 0x0000000000403016 inSync ()
> #12 0x0000000000401ef8 in main ()
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>