Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Segmentation fault - Address not mapped
From: Ashley Pittman (ashley_at_[hidden])
Date: 2009-07-07 07:14:11


This is the error you get when an invalid communicator handle is passed
to a MPI function, the handle is deferenced so you may or may not get a
SEGV from it depending on the value you pass.

The 0x440000a0 address is an offset from 0x44000000, the value of
MPI_COMM_WORLD in mpich2, my guess would be you are either picking up a
mpich2 mpi.h or the mpich2 mpicc.

Ashley,

On Tue, 2009-07-07 at 11:05 +0100, Catalin David wrote:
> Hello, all!
>
> Just installed Valgrind (since this seems like a memory issue) and got
> this interesting output (when running the test program):
>
> ==4616== Syscall param sched_setaffinity(mask) points to unaddressable byte(s)
> ==4616== at 0x43656BD: syscall (in /lib/tls/libc-2.3.2.so)
> ==4616== by 0x4236A75: opal_paffinity_linux_plpa_init (plpa_runtime.c:37)
> ==4616== by 0x423779B:
> opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:501)
> ==4616== by 0x4235FEE: linux_module_init (paffinity_linux_module.c:119)
> ==4616== by 0x447F114: opal_paffinity_base_select
> (paffinity_base_select.c:64)
> ==4616== by 0x444CD71: opal_init (opal_init.c:292)
> ==4616== by 0x43CE7E6: orte_init (orte_init.c:76)
> ==4616== by 0x4067A50: ompi_mpi_init (ompi_mpi_init.c:342)
> ==4616== by 0x40A3444: PMPI_Init (pinit.c:80)
> ==4616== by 0x804875C: main (test.cpp:17)
> ==4616== Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095772: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
> [denali:04616] *** Process received signal ***
> [denali:04616] Signal: Segmentation fault (11)
> [denali:04616] Signal code: Address not mapped (1)
> [denali:04616] Failing at address: 0x440000a0
> [denali:04616] [ 0] /lib/tls/libc.so.6 [0x42b4de0]
> [denali:04616] [ 1]
> /users/cluster/cdavid/local/lib/libmpi.so.0(MPI_Comm_size+0x6f)
> [0x409581f]
> [denali:04616] [ 2] ./test(__gxx_personality_v0+0x12d) [0x8048771]
> [denali:04616] [ 3] /lib/tls/libc.so.6(__libc_start_main+0xf8) [0x42a2768]
> [denali:04616] [ 4] ./test(__gxx_personality_v0+0x3d) [0x8048681]
> [denali:04616] *** End of error message ***
> ==4616==
> ==4616== Invalid read of size 4
> ==4616== at 0x4095782: ompi_comm_invalid (communicator.h:261)
> ==4616== by 0x409581E: PMPI_Comm_size (pcomm_size.c:46)
> ==4616== by 0x8048770: main (test.cpp:18)
> ==4616== Address 0x440000a0 is not stack'd, malloc'd or (recently) free'd
>
>
> The problem is that, now, I don't know where the issue comes from (is
> it libc that is too old and incompatible with g++ 4.4/OpenMPI? is libc
> broken?).
>
> Any help would be highly appreciated.
>
> Thanks,
> Catalin
>
>
> On Mon, Jul 6, 2009 at 3:36 PM, Catalin David<catalindavid2003_at_[hidden]> wrote:
> > On Mon, Jul 6, 2009 at 3:26 PM, jody<jody.xha_at_[hidden]> wrote:
> >> Hi
> >> Are you also sure that you have the same version of Open-MPI
> >> on every machine of your cluster, and that it is the mpicxx of this
> >> version that is called when you run your program?
> >> I ask because you mentioned that there was an old version of Open-MPI
> >> present... die you remove this?
> >>
> >> Jody
> >
> > Hi
> >
> > I have just logged in a few other boxes and they all mount my home
> > folder. When running `echo $LD_LIBRARY_PATH` and other commands, I get
> > what I expect to get, but this might be because I have set these
> > variables in the .bashrc file. So, I tried compiling/running like this
> > ~/local/bin/mpicxx [stuff] and ~/local/bin/mpirun -np 4 ray-trace,
> > but I get the same errors.
> >
> > As for the previous version, I don't have root access, therefore I was
> > not able to remove it. I was just trying to outrun it by setting the
> > $PATH variable to point first at my local installation.
> >
> >
> > Catalin
> >
> >
> > --
> >
> > ******************************
> > Catalin David
> > B.Sc. Computer Science 2010
> > Jacobs University Bremen
> >
> > Phone: +49-(0)1577-49-38-667
> >
> > College Ring 4, #343
> > Bremen, 28759
> > Germany
> > ******************************
> >
>
>
>

-- 
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing
http://padb.pittman.org.uk