Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about compatibility issues
From: Ted Yu (tedhyu_at_[hidden])
Date: 2009-02-01 11:28:25


Thanx for the info.  It turned out to be a problem with the software, and not an open-mpi issue.

Ted

--- On Sun, 2/1/09, Jeff Squyres <jsquyres_at_[hidden]> wrote:
From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] Question about compatibility issues
To: tedhyu_at_[hidden], "Open MPI Users" <users_at_[hidden]>
Date: Sunday, February 1, 2009, 3:28 AM

On Jan 26, 2009, at 4:57 PM, Ted Yu wrote:

> I'm new to this group. I'm trying to implement a parallel quantum
code called "Seqquest".
> I'm trying to figure out why there is an error in the implementation
of this code where there is an error:
>
> This job has allocated 2 cpus
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:(nil)
> [0] func:/usr/lib64/openmpi/libopal.so.0 [0x393af21dc5]
> [1] func:/lib64/tls/libpthread.so.0 [0x393b80c4f0]
> [2]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x
[0x4f5cfd]
> [3]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(rhosave_+0x120)
[0x4f6a8a]
> [4]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(MAIN__+0xb710)
[0x431770]
> [5]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(main+0xe)
[0xa717ee]
> [6] func:/lib64/tls/libc.so.6(__libc_start_main+0xdb) [0x393b11c3fb]
> [7]
func:/project/source/seqquest/seqquest_source_v261i/hive_CentOS4.5_parallel/build_261i/quest_ompi.x(free+0x3a)
[0x425fca]
> *** End of error message ***
> ^@mpiexec: Warning: task 0 died with signal 11 (Segmentation fault).
>
>
> Trying to debug this code, I noticed that the math library is an Intel
math library, but all of the codes including scalapack and blacs were compiled
using GNU compiler. Will there be compatibility issues?

There *could* be. Have you tried to compile everything with the GNU compiler?

You might also try to examine what exactly in free() is going bad -- are you
passing a bad address to free? Can you run the code through a debugger and/or
examine corefiles?

--Jeff Squyres
Cisco Systems