Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] processes aborting on MPI_Finalize
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-29 15:28:51


Sorry for the delay in replying.

It's hard to say where the error is. I *hope* it's not in the release
version of OMPI. :)

Have you run through a memory checking debugger such as valgrind?
Seeing errors in free() like this are *usually* some kind of heap
violation (e.g., a bad or multiple free, or a buffer overflow, or
somesuch).

On Feb 20, 2008, at 11:48 AM, Adams, Samuel D AFRL/RHDR wrote:

> I noticed that I was spitting this out on stderr:
>
> Is this an OpenMPI problem?
>
> [prodnode31:26364] *** Process received signal ***
> [prodnode31:26364] Signal: Segmentation fault (11)
> [prodnode31:26364] Signal code: (128)
> [prodnode31:26364] Failing at address: (nil)
> [prodnode31:26364] [ 0] /lib64/libpthread.so.0 [0x35cea0dd40]
> [prodnode31:26364] [ 1]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(_int_free+0x18e)
> [0x2aaaaafcb99e]
> [prodnode31:26364] [ 2]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.0(free+0xbd)
> [0x2aaaaafcbd9d]
> [prodnode31:26364] [ 3] /usr/local/profiles/gcc-openmpi/lib/
> libmpi.so.0
> [0x2aaaaaad4589]
> [prodnode31:26364] [ 4]
> /usr/local/profiles/gcc-openmpi/lib/openmpi/
> mca_btl_tcp.so(mca_btl_tcp_c
> omponent_close+0x109) [0x2aaab0e341e9]
> [prodnode31:26364] [ 5]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.
> 0(mca_base_components
> _close+0x83) [0x2aaaaafbbe53]
> [prodnode31:26364] [ 6]
> /usr/local/profiles/gcc-openmpi/lib/libmpi.so.0(mca_btl_base_close
> +0xb3)
> [0x2aaaaab1da13]
> [prodnode31:26364] [ 7]
> /usr/local/profiles/gcc-openmpi/lib/openmpi/
> mca_pml_ob1.so(mca_pml_ob1_c
> omponent_close+0x35) [0x2aaab060fd55]
> [prodnode31:26364] [ 8]
> /usr/local/profiles/gcc-openmpi/lib/libopen-pal.so.
> 0(mca_base_components
> _close+0x83) [0x2aaaaafbbe53]
> [prodnode31:26364] [ 9]
> /usr/local/profiles/gcc-openmpi/lib/libmpi.so.0(mca_pml_base_close
> +0x48)
> [0x2aaaaab23818]
> [prodnode31:26364] [10]
> /usr/local/profiles/gcc-openmpi/lib/libmpi.so.0(ompi_mpi_finalize
> +0x1a2)
> [0x2aaaaaaeda02]
> [prodnode31:26364] [11] /home/sam/code/fdtd/fdtd_0.4/fdtd(main+0x1b2)
> [0x4054f2]
> [prodnode31:26364] [12] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x35ce21d8a4]
> [prodnode31:26364] [13] /home/sam/code/fdtd/fdtd_0.4/fdtd [0x4035e9]
> [prodnode31:26364] *** End of error message ***
> mpirun noticed that job rank 0 with PID 26364 on node
> prodnode31.brooks.af.mil exited on signal 11 (Segmentation fault).
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
> -----Original Message-----
> From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]]
> On
> Behalf Of Adams, Samuel D AFRL/RHDR
> Sent: Tuesday, February 19, 2008 3:02 PM
> To: Open MPI Users
> Subject: [OMPI users] processes aborting on MPI_Finalize
>
> This is probably some coding error on my part, but under some problem
> divisions I get processes aborting when I call MPI_Finalize().
> Perhaps
> they are still waiting incorrectly to recived some message or
> something
> like that. Sometimes it seems to work. Is there a good way to get to
> the bottom of this error?
>
>
> ----output-----
> 4 additional processes aborted (not shown)
>
> Sam Adams
> General Dynamics Information Technology
> Phone: 210.536.5945
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems