Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segmentation fault - Address not mapped
From: Dorian Krause (doriankrause_at_[hidden])
Date: 2009-07-07 07:23:49


Catalin David wrote:
> Hello, all!
>
> Just installed Valgrind (since this seems like a memory issue) and got
> this interesting output (when running the test program):
>
> ==4616== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
> ==4616== at 0x43656BD: syscall (in /lib/tls/libc-2.3.2.so)
> ==4616== by 0x4236A75: opal_paffinity_linux_plpa_init (plpa_runtime.c:37)
> ==4616== by 0x423779B:
> opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:501)
> ==4616== by 0x4235FEE: linux_module_init (paffinity_linux_module.c:119)
> ==4616== by 0x447F114: opal_paffinity_base_select
> (paffinity_base_select.c:64)
> ==4616== by 0x444CD71: opal_init (opal_init.c:292)
> ==4616== by 0x43CE7E6: orte_init (orte_init.c:76)
> ==4616== by 0x4067A50: ompi_mpi_init (ompi_mpi_init.c:342)
> ==4616== by 0x40A3444: PMPI_Init (pinit.c:80)
> ==4616== by 0x804875C: main (test.cpp:17)
> ==4616== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095772: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
> [denali:04616] *** Process received signal ***
> [denali:04616] Signal: Segmentation fault (11)
> [denali:04616] Signal code: Address not mapped (1)
> [denali:04616] Failing at address: 0x440000a0
> [denali:04616] [ 0] /lib/tls/libc.so.6 [0x42b4de0]
> [denali:04616] [ 1]
> /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x6f)
> [0x409581f]
> [denali:04616] [ 2] ./test(__gxx_personality_v0+0x12d) [0x8048771]
> [denali:04616] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x42a2768]
> [denali:04616] [ 4] ./test(__gxx_personality_v0+0x3d) [0x8048681]
> [denali:04616] *** End of error message ***
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095782: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
>
>
> The problem is that, now, I don't know where the issue comes from (is
> it libc that is too old and incompatible with g++ 4.4/OpenMPI? is libc
> broken?).
>
Looking at the code for ompi_comm_invalid:

static inline int ompi_comm_invalid(ompi_communicator_t* comm)
{
    if ((NULL == comm) || (MPI_COMM_NULL == comm) ||
        (OMPI_COMM_IS_FREED(comm)) || (OMPI_COMM_IS_INVALID(comm)) )
        return true;
    else
        return false;
}

the interesting point is that (MPI_COMM_NULL == comm) evaluates to
false, otherwise the following macros (where the invalid read occurs)
would not be evaluated.

The only idea that comes to my mind is that you are mixing MPI versions,
but as you said your PATH is fine ?!

Regards,
Dorian

> Any help would be highly appreciated.
>
> Thanks,
> Catalin
>
>
> On Mon, Jul 6, 2009 at 3:36 PM, Catalin David<catalindavid2003_at_[hidden]> wrote:
>
>> On Mon, Jul 6, 2009 at 3:26 PM, jody<jody.xha_at_[hidden]> wrote:
>>
>>> Hi
>>> Are you also sure that you have the same version of Open-MPI
>>> on every machine of your cluster, and that it is the mpicxx of this
>>> version that is called when you run your program?
>>> I ask because you mentioned that there was an old version of Open-MPI
>>> present... die you remove this?
>>>
>>> Jody
>>>
>> Hi
>>
>> I have just logged in a few other boxes and they all mount my home
>> folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get
>> what I expect to get, but this might be because I have set these
>> variables in the .bashrc file. So, I tried compiling/running like this
>> ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace,
>> but I get the same errors.
>>
>> As for the previous version, I don't have root access, therefore I was
>> not able to remove it. I was just trying to outrun it by setting the
>> $PATH variable to point first at my local installation.
>>
>>
>> Catalin
>>
>>
>> --
>>
>> ******************************
>> Catalin David
>> B.Sc. Computer Science 2010
>> Jacobs University Bremen
>>
>> Phone: +49-(0)1577-49-38-667
>>
>> College Ring 4, #343
>> Bremen, 28759
>> Germany
>> ******************************
>>
>>
>
>
>
>