Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Segmentation fault - Address not mapped
From: Dorian Krause (doriankrause_at_[hidden])
Date: 2009-07-07 07:23:49


Catalin David wrote:
> Hello, all!
>
> Just installed Valgrind (since this seems like a memory issue) and got
> this interesting output (when running the test program):
>
> ==4616== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
> ==4616== at 0x43656BD: syscall (in /lib/tls/libc-2.3.2.so)
> ==4616== by 0x4236A75: opal_paffinity_linux_plpa_init (plpa_runtime.c:37)
> ==4616== by 0x423779B:
> opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:501)
> ==4616== by 0x4235FEE: linux_module_init (paffinity_linux_module.c:119)
> ==4616== by 0x447F114: opal_paffinity_base_select
> (paffinity_base_select.c:64)
> ==4616== by 0x444CD71: opal_init (opal_init.c:292)
> ==4616== by 0x43CE7E6: orte_init (orte_init.c:76)
> ==4616== by 0x4067A50: ompi_mpi_init (ompi_mpi_init.c:342)
> ==4616== by 0x40A3444: PMPI_Init (pinit.c:80)
> ==4616== by 0x804875C: main (test.cpp:17)
> ==4616== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095772: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
> [denali:04616] *** Process received signal ***
> [denali:04616] Signal: Segmentation fault (11)
> [denali:04616] Signal code: Address not mapped (1)
> [denali:04616] Failing at address: 0x440000a0
> [denali:04616] [ 0] /lib/tls/libc.so.6 [0x42b4de0]
> [denali:04616] [ 1]
> /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x6f)
> [0x409581f]
> [denali:04616] [ 2] ./test(__gxx_personality_v0+0x12d) [0x8048771]
> [denali:04616] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x42a2768]
> [denali:04616] [ 4] ./test(__gxx_personality_v0+0x3d) [0x8048681]
> [denali:04616] *** End of error message ***
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095782: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
>
>
> The problem is that, now, I don't know where the issue comes from (is
> it libc that is too old and incompatible with g++ 4.4/OpenMPI? is libc
> broken?).
>
Looking at the code for ompi_comm_invalid:

static inline int ompi_comm_invalid(ompi_communicator_t* comm)
{
    if ((NULL == comm) || (MPI_COMM_NULL == comm) ||
        (OMPI_COMM_IS_FREED(comm)) || (OMPI_COMM_IS_INVALID(comm)) )
        return true;
    else
        return false;
}

the interesting point is that (MPI_COMM_NULL == comm) evaluates to
false, otherwise the following macros (where the invalid read occurs)
would not be evaluated.

The only idea that comes to my mind is that you are mixing MPI versions,
but as you said your PATH is fine ?!

Regards,
Dorian

> Any help would be highly appreciated.
>
> Thanks,
> Catalin
>
>
> On Mon, Jul 6, 2009 at 3:36 PM, Catalin David<catalindavid2003_at_[hidden]> wrote:
>
>> On Mon, Jul 6, 2009 at 3:26 PM, jody<jody.xha_at_[hidden]> wrote:
>>
>>> Hi
>>> Are you also sure that you have the same version of Open-MPI
>>> on every machine of your cluster, and that it is the mpicxx of this
>>> version that is called when you run your program?
>>> I ask because you mentioned that there was an old version of Open-MPI
>>> present... die you remove this?
>>>
>>> Jody
>>>
>> Hi
>>
>> I have just logged in a few other boxes and they all mount my home
>> folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get
>> what I expect to get, but this might be because I have set these
>> variables in the .bashrc file. So, I tried compiling/running like this
>> ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace,
>> but I get the same errors.
>>
>> As for the previous version, I don't have root access, therefore I was
>> not able to remove it. I was just trying to outrun it by setting the
>> $PATH variable to point first at my local installation.
>>
>>
>> Catalin
>>
>>
>> --
>>
>> ******************************
>> Catalin David
>> B.Sc. Computer Science 2010
>> Jacobs University Bremen
>>
>> Phone: +49-(0)1577-49-38-667
>>
>> College Ring 4, #343
>> Bremen, 28759
>> Germany
>> ******************************
>>
>>
>
>
>
>