Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian Barrett (bbarrett_at_[hidden])
Date: 2007-07-19 10:34:57


Ah, yeah, that's bad on OS X. Because of the two level namespace
features of OS X, doing the provide your own malloc and free tricks
that work sometimes on Linux and Solaris don't work so well on OS X.
The malloc() strdup() finds will be the malloc() in libSystem, but
the free() that Open MPI finds to free the buffer from strdup() will
be the one in libcasa_casa. At that point, it all goes downhill.

Brian

On Jul 18, 2007, at 5:14 PM, Tim Cornwell wrote:

>
> Brian,
>
> To close this one off, we found that one of our libraries has a
> malloc/free that was being called from ompi. I should have looked at
> the crash reporter. It reported
>
> Exception: EXC_BAD_ACCESS (0x0001)
> Codes: KERN_INVALID_ADDRESS (0x0001) at 0x05801bfc
>
> Thread 0 Crashed:
> 0 libcasa_casa.dylib 0x0107b319 free + 51
> 1 libopen-pal.0.dylib 0x0289eff9 opal_install_dirs_expand + 467
> (installdirs_base_expand.c:68)
> 2 libopen-pal.0.dylib 0x0289e5a0 opal_installdirs_base_open + 1115
> (installdirs_base_components.c:96)
> 3 libopen-pal.0.dylib 0x0287ba40 opal_init_util + 217 (opal_init.c:
> 150)
> 4 libopen-pal.0.dylib 0x0287bb24 opal_init + 24 (opal_init.c:200)
> 5 libmpi.0.dylib 0x01d745cd ompi_mpi_init + 33
> (ompi_mpi_init.c:219)
> 6 libmpi.0.dylib 0x01db48db MPI_Init + 293 (init.c:71)
> 7 ctest 0x00002f90 main + 24 (ctest.cc:4)
> 8 ctest 0x00002906 _start + 216
> 9 ctest 0x0000282d start + 41
>
> On looking into this more, we found that the Lea Malloc was used in
> the casa_casa library. Removing it cured the problem.
>
> Thanks for the help,
>
> Tim
>
> On 12/07/2007, at 2:54 PM, Tim Cornwell wrote:
>
>>
>> Brian,
>>
>> I think it's just a symbol clash. A test program linked with just
>> mpicxx works fine but with our typical link, it fails. I've
>> narrowed the problem down to a single shared library. This is from C
>> ++ and the symbols have a namespace casa. Weeding out all the the
>> casa stuff and some other cruft, we're left with:
>>
>> 0009df14 T QuantaProxy::fits()
>> 0011277c S int __gnu_cxx::__capture_isnan<double>(double)
>> 0014b4ae S std::invalid_argument::~invalid_argument()
>> 0014b48e S std::invalid_argument::~invalid_argument()
>> 00112790 S int std::isnan<double>(double)
>> 001200e8 S void** std::fill_n<void**, unsigned int, void*>(void**,
>> unsigned int, void* const&)
>> 0012da12 S std::complex<double>* std::fill_n<std::complex<double>*,
>> unsigned int, std::complex<double> >(std::complex<double>*,
>> unsigned int, std::complex<double> const&)
>> 0012d9ae S std::complex<float>* std::fill_n<std::complex<float>*,
>> unsigned int, std::complex<float> >(std::complex<float>*, unsigned
>> int, std::complex<float> const&)
>> 00104a4c S bool* std::fill_n<bool*, unsigned int, bool>(bool*,
>> unsigned int, bool const&)
>> 0010b126 S double* std::fill_n<double*, unsigned int, double>
>> (double*, unsigned int, double const&)
>> 0012043a S float* std::fill_n<float*, unsigned int, float>(float*,
>> unsigned int, float const&)
>> 00120386 S int* std::fill_n<int*, unsigned int, int>(int*, unsigned
>> int, int const&)
>> 001203e0 S unsigned int* std::fill_n<unsigned int*, unsigned int,
>> unsigned int>(unsigned int*, unsigned int, unsigned int const&)
>> 00120322 S short* std::fill_n<short*, unsigned int, short>(short*,
>> unsigned int, short const&)
>> 0012d94a S unsigned short* std::fill_n<unsigned short*, unsigned
>> int, unsigned short>(unsigned short*, unsigned int, unsigned short
>> const&)
>> 00112bf6 S void std::__reverse<__gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > > >(__gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >, std::random_access_iterator_tag)
>> 00112bbc S __gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >
>> std::transform<__gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*,
>> std::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > >, int (*)(int)>
>> (__gnu_cxx::__normal_iterator<char*, std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > >,
>> __gnu_cxx::__normal_iterator<char*, std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > >,
>> __gnu_cxx::__normal_iterator<char*, std::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > >, int (*)(int))
>> 00198740 S typeinfo for std::invalid_argument
>> 00192cac S typeinfo name for std::invalid_argument
>> 001993e0 S vtable for std::invalid_argument
>>
>>
>> We're all using the standard of OS X:
>>
>> $ mpicxx -v
>> Using built-in specs.
>> Target: i686-apple-darwin8
>> Configured with: /private/var/tmp/gcc/gcc-5367.obj~1/src/configure
>> --disable-checking -enable-werror --prefix=/usr --mandir=/share/man
>> --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^
>> [cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --
>> with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --with-
>> arch=nocona --with-tune=generic --program-prefix= --host=i686-apple-
>> darwin8 --target=i686-apple-darwin8
>> Thread model: posix
>> gcc version 4.0.1 (Apple Computer, Inc. build 5367)
>>
>> Tim
>>
>>
>>
>> On 12/07/2007, at 7:57 AM, Brian Barrett wrote:
>>
>>> That's unexpected. If you run the command 'ompi_info --all', it
>>> should list (towards the top) things like the Bindir and Libdir.
>>> Can
>>> you see if those have sane values? If they do, can you try
>>> running a
>>> simple hello, world type MPI application (there's one in the OMPI
>>> tarball). It almost looks like memory is getting corrupted, which
>>> would be very unexpected that early in the process. I'm unable to
>>> duplicate the problem with 1.2.3 on my Mac Pro, making it all the
>>> more strange.
>>>
>>> Another random thought -- Which compilers did you use to build
>>> Open MPI?
>>>
>>> Brian
>>>
>>>
>>> On Jul 11, 2007, at 1:27 PM, Tim Cornwell wrote:
>>>
>>>>
>>>> Open MPI: 1.2.3
>>>> Open MPI SVN revision: r15136
>>>> Open RTE: 1.2.3
>>>> Open RTE SVN revision: r15136
>>>> OPAL: 1.2.3
>>>> OPAL SVN revision: r15136
>>>> Prefix: /usr/local
>>>> Configured architecture: i386-apple-darwin8.10.1
>>>>
>>>> Hi Brian,
>>>>
>>>> 1.2.3 downloaded and built from source.
>>>>
>>>> Tim
>>>>
>>>> On 12/07/2007, at 12:50 AM, Brian Barrett wrote:
>>>>
>>>>> Which version of Open MPI are you using?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Brian
>>>>>
>>>>> On Jul 11, 2007, at 3:32 AM, Tim Cornwell wrote:
>>>>>
>>>>>>
>>>>>> I have a problem running openmpi under OS 10.4.10. My program
>>>>>> runs
>>>>>> fine under debian x86_64 on an opteron but under OS X on a number
>>>>>> of Mac Book and Mac Book Pros, I get the following immediately on
>>>>>> startup. This smells like a common problem but I could find
>>>>>> anything relevant anywhere. Can anyone provide a hint or better
>>>>>> yet
>>>>>> a solution?
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>> Program received signal EXC_BAD_ACCESS, Could not access memory.
>>>>>> Reason: KERN_PROTECTION_FAILURE at address: 0x0000000c
>>>>>> 0x04510412 in free ()
>>>>>> (gdb) where
>>>>>> #0 0x04510412 in free ()
>>>>>> #1 0x05d24f80 in opal_install_dirs_expand (input=0x5d2a6b0 "$
>>>>>> {prefix}") at base/installdirs_base_expand.c:67
>>>>>> #2 0x05d24584 in opal_installdirs_base_open () at base/
>>>>>> installdirs_base_components.c:94
>>>>>> #3 0x05d01a40 in opal_init_util () at runtime/opal_init.c:150
>>>>>> #4 0x05d01b24 in opal_init () at runtime/opal_init.c:200
>>>>>> #5 0x051fa5cd in ompi_mpi_init (argc=1, argv=0xbfffde74,
>>>>>> requested=0, provided=0xbfffd930) at runtime/ompi_mpi_init.c:219
>>>>>> #6 0x0523a8db in MPI_Init (argc=0xbfffd980, argv=0xbfffde14) at
>>>>>> init.c:71
>>>>>> #7 0x0005a03d in conrad::cp::MPIConnection::initMPI (argc=1,
>>>>>> argv=@0xbfffde14) at mwcommon/MPIConnection.cc:83
>>>>>> #8 0x00004163 in main (argc=1, argv=0xbfffde74) at apps/
>>>>>> cimager.cc:
>>>>>> 155
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> -
>>>>>> -
>>>>>> ----------
>>>>>> Tim Cornwell, Australia Telescope National Facility, CSIRO
>>>>>> Location: Cnr Pembroke & Vimiera Rds, Marsfield, NSW, 2122,
>>>>>> AUSTRALIA
>>>>>> Post: PO Box 76, Epping, NSW 1710, AUSTRALIA
>>>>>> Phone: +61 2 9372 4261 Fax: +61 2 9372 4450 or 4310
>>>>>> Mobile: +61 4 3366 5399
>>>>>> Email: Tim.Cornwell_at_[hidden]
>>>>>> URL: http://www.atnf.csiro.au/people/tim.cornwell
>>>>>> -----------------------------------------------------------------
>>>>>> -
>>>>>> --
>>>>>> -
>>>>>> -
>>>>>> -----------
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users