Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Tim Cornwell (tim.cornwell_at_[hidden])
Date: 2007-07-18 19:14:37


Brian,

To close this one off, we found that one of our libraries has a
malloc/free that was being called from ompi. I should have looked at
the crash reporter. It reported

Exception: EXC_BAD_ACCESS (0x0001)
Codes: KERN_INVALID_ADDRESS (0x0001) at 0x05801bfc

Thread 0 Crashed:
0 libcasa_casa.dylib 0x0107b319 free + 51
1 libopen-pal.0.dylib 0x0289eff9 opal_install_dirs_expand + 467
(installdirs_base_expand.c:68)
2 libopen-pal.0.dylib 0x0289e5a0 opal_installdirs_base_open + 1115
(installdirs_base_components.c:96)
3 libopen-pal.0.dylib 0x0287ba40 opal_init_util + 217 (opal_init.c:
150)
4 libopen-pal.0.dylib 0x0287bb24 opal_init + 24 (opal_init.c:200)
5 libmpi.0.dylib 0x01d745cd ompi_mpi_init + 33
(ompi_mpi_init.c:219)
6 libmpi.0.dylib 0x01db48db MPI_Init + 293 (init.c:71)
7 ctest 0x00002f90 main + 24 (ctest.cc:4)
8 ctest 0x00002906 _start + 216
9 ctest 0x0000282d start + 41

On looking into this more, we found that the Lea Malloc was used in
the casa_casa library. Removing it cured the problem.

Thanks for the help,

Tim

On 12/07/2007, at 2:54 PM, Tim Cornwell wrote:

>
> Brian,
>
> I think it's just a symbol clash. A test program linked with just
> mpicxx works fine but with our typical link, it fails. I've
> narrowed the problem down to a single shared library. This is from C
> ++ and the symbols have a namespace casa. Weeding out all the the
> casa stuff and some other cruft, we're left with:
>
> 0009df14 T QuantaProxy::fits()
> 0011277c S int __gnu_cxx::__capture_isnan<double>(double)
> 0014b4ae S std::invalid_argument::~invalid_argument()
> 0014b48e S std::invalid_argument::~invalid_argument()
> 00112790 S int std::isnan<double>(double)
> 001200e8 S void** std::fill_n<void**, unsigned int, void*>(void**,
> unsigned int, void* const&)
> 0012da12 S std::complex<double>* std::fill_n<std::complex<double>*,
> unsigned int, std::complex<double> >(std::complex<double>*,
> unsigned int, std::complex<double> const&)
> 0012d9ae S std::complex<float>* std::fill_n<std::complex<float>*,
> unsigned int, std::complex<float> >(std::complex<float>*, unsigned
> int, std::complex<float> const&)
> 00104a4c S bool* std::fill_n<bool*, unsigned int, bool>(bool*,
> unsigned int, bool const&)
> 0010b126 S double* std::fill_n<double*, unsigned int, double>
> (double*, unsigned int, double const&)
> 0012043a S float* std::fill_n<float*, unsigned int, float>(float*,
> unsigned int, float const&)
> 00120386 S int* std::fill_n<int*, unsigned int, int>(int*, unsigned
> int, int const&)
> 001203e0 S unsigned int* std::fill_n<unsigned int*, unsigned int,
> unsigned int>(unsigned int*, unsigned int, unsigned int const&)
> 00120322 S short* std::fill_n<short*, unsigned int, short>(short*,
> unsigned int, short const&)
> 0012d94a S unsigned short* std::fill_n<unsigned short*, unsigned
> int, unsigned short>(unsigned short*, unsigned int, unsigned short
> const&)
> 00112bf6 S void std::__reverse<__gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > > >(__gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, std::random_access_iterator_tag)
> 00112bbc S __gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >
> std::transform<__gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, __gnu_cxx::__normal_iterator<char*,
> std::basic_string<char, std::char_traits<char>,
> std::allocator<char> > >, int (*)(int)>
> (__gnu_cxx::__normal_iterator<char*, std::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> __gnu_cxx::__normal_iterator<char*, std::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >,
> __gnu_cxx::__normal_iterator<char*, std::basic_string<char,
> std::char_traits<char>, std::allocator<char> > >, int (*)(int))
> 00198740 S typeinfo for std::invalid_argument
> 00192cac S typeinfo name for std::invalid_argument
> 001993e0 S vtable for std::invalid_argument
>
>
> We're all using the standard of OS X:
>
> $ mpicxx -v
> Using built-in specs.
> Target: i686-apple-darwin8
> Configured with: /private/var/tmp/gcc/gcc-5367.obj~1/src/configure
> --disable-checking -enable-werror --prefix=/usr --mandir=/share/man
> --enable-languages=c,objc,c++,obj-c++ --program-transform-name=/^
> [cg][^.-]*$/s/$/-4.0/ --with-gxx-include-dir=/include/c++/4.0.0 --
> with-slibdir=/usr/lib --build=powerpc-apple-darwin8 --with-
> arch=nocona --with-tune=generic --program-prefix= --host=i686-apple-
> darwin8 --target=i686-apple-darwin8
> Thread model: posix
> gcc version 4.0.1 (Apple Computer, Inc. build 5367)
>
> Tim
>
>
>
> On 12/07/2007, at 7:57 AM, Brian Barrett wrote:
>
>> That's unexpected. If you run the command 'ompi_info --all', it
>> should list (towards the top) things like the Bindir and Libdir. Can
>> you see if those have sane values? If they do, can you try running a
>> simple hello, world type MPI application (there's one in the OMPI
>> tarball). It almost looks like memory is getting corrupted, which
>> would be very unexpected that early in the process. I'm unable to
>> duplicate the problem with 1.2.3 on my Mac Pro, making it all the
>> more strange.
>>
>> Another random thought -- Which compilers did you use to build
>> Open MPI?
>>
>> Brian
>>
>>
>> On Jul 11, 2007, at 1:27 PM, Tim Cornwell wrote:
>>
>>>
>>> Open MPI: 1.2.3
>>> Open MPI SVN revision: r15136
>>> Open RTE: 1.2.3
>>> Open RTE SVN revision: r15136
>>> OPAL: 1.2.3
>>> OPAL SVN revision: r15136
>>> Prefix: /usr/local
>>> Configured architecture: i386-apple-darwin8.10.1
>>>
>>> Hi Brian,
>>>
>>> 1.2.3 downloaded and built from source.
>>>
>>> Tim
>>>
>>> On 12/07/2007, at 12:50 AM, Brian Barrett wrote:
>>>
>>>> Which version of Open MPI are you using?
>>>>
>>>> Thanks,
>>>>
>>>> Brian
>>>>
>>>> On Jul 11, 2007, at 3:32 AM, Tim Cornwell wrote:
>>>>
>>>>>
>>>>> I have a problem running openmpi under OS 10.4.10. My program runs
>>>>> fine under debian x86_64 on an opteron but under OS X on a number
>>>>> of Mac Book and Mac Book Pros, I get the following immediately on
>>>>> startup. This smells like a common problem but I could find
>>>>> anything relevant anywhere. Can anyone provide a hint or better
>>>>> yet
>>>>> a solution?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Tim
>>>>>
>>>>>
>>>>> Program received signal EXC_BAD_ACCESS, Could not access memory.
>>>>> Reason: KERN_PROTECTION_FAILURE at address: 0x0000000c
>>>>> 0x04510412 in free ()
>>>>> (gdb) where
>>>>> #0 0x04510412 in free ()
>>>>> #1 0x05d24f80 in opal_install_dirs_expand (input=0x5d2a6b0 "$
>>>>> {prefix}") at base/installdirs_base_expand.c:67
>>>>> #2 0x05d24584 in opal_installdirs_base_open () at base/
>>>>> installdirs_base_components.c:94
>>>>> #3 0x05d01a40 in opal_init_util () at runtime/opal_init.c:150
>>>>> #4 0x05d01b24 in opal_init () at runtime/opal_init.c:200
>>>>> #5 0x051fa5cd in ompi_mpi_init (argc=1, argv=0xbfffde74,
>>>>> requested=0, provided=0xbfffd930) at runtime/ompi_mpi_init.c:219
>>>>> #6 0x0523a8db in MPI_Init (argc=0xbfffd980, argv=0xbfffde14) at
>>>>> init.c:71
>>>>> #7 0x0005a03d in conrad::cp::MPIConnection::initMPI (argc=1,
>>>>> argv=@0xbfffde14) at mwcommon/MPIConnection.cc:83
>>>>> #8 0x00004163 in main (argc=1, argv=0xbfffde74) at apps/
>>>>> cimager.cc:
>>>>> 155
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> -
>>>>> -
>>>>> ----------
>>>>> Tim Cornwell, Australia Telescope National Facility, CSIRO
>>>>> Location: Cnr Pembroke & Vimiera Rds, Marsfield, NSW, 2122,
>>>>> AUSTRALIA
>>>>> Post: PO Box 76, Epping, NSW 1710, AUSTRALIA
>>>>> Phone: +61 2 9372 4261 Fax: +61 2 9372 4450 or 4310
>>>>> Mobile: +61 4 3366 5399
>>>>> Email: Tim.Cornwell_at_[hidden]
>>>>> URL: http://www.atnf.csiro.au/people/tim.cornwell
>>>>> ------------------------------------------------------------------
>>>>> --
>>>>> -
>>>>> -
>>>>> -----------
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>