Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Segmentation fault / Address not mapped (1) with 2-node job on Rocks 5.2
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-06-22 02:05:53


Sorry for the problem - the issue is a bug in the handling of the pernode option in 1.4.2. This has been fixed and awaits release in 1.4.3.

On Jun 21, 2010, at 5:27 PM, Riccardo Murri wrote:

> Hello,
>
> I'm using OpenMPI 1.4.2 on a Rocks 5.2 cluster. I compiled it on my
> own to have a thread-enabled MPI (the OMPI coming with Rocks 5.2
> apparently only supports MPI_THREAD_SINGLE), and installed into ~/sw.
>
> To test the newly installed library I compiled a simple "hello world"
> that comes with Rocks::
>
> [murri_at_idgc3grid01 hello_mpi.d]$ cat hello_mpi.c
> #include <stdio.h>
> #include <sys/utsname.h>
>
> #include <mpi.h>
>
> int main(int argc, char **argv) {
> int myrank;
> struct utsname unam;
>
> MPI_Init(&argc, &argv);
>
> uname(&unam);
> MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
> printf("Hello from rank %d on host %s\n", myrank, unam.nodename);
>
> MPI_Finalize();
> }
>
> The program runs fine as long as it only uses ranks on localhost::
>
> [murri_at_idgc3grid01 hello_mpi.d]$ mpirun --host localhost -np 2 hello_mpi
> Hello from rank 1 on host idgc3grid01.uzh.ch
> Hello from rank 0 on host idgc3grid01.uzh.ch
>
> However, as soon as I try to run on more than one host, I get a
> segfault::
>
> [murri_at_idgc3grid01 hello_mpi.d]$ mpirun --host
> idgc3grid01,compute-0-11 --pernode hello_mpi
> [idgc3grid01:13006] *** Process received signal ***
> [idgc3grid01:13006] Signal: Segmentation fault (11)
> [idgc3grid01:13006] Signal code: Address not mapped (1)
> [idgc3grid01:13006] Failing at address: 0x50
> [idgc3grid01:13006] [ 0] /lib64/libpthread.so.0 [0x359420e4c0]
> [idgc3grid01:13006] [ 1]
> /home/oci/murri/sw/lib/libopen-rte.so.0(orte_util_encode_pidmap+0xdb)
> [0x2b352d00265b]
> [idgc3grid01:13006] [ 2]
> /home/oci/murri/sw/lib/libopen-rte.so.0(orte_odls_base_default_get_add_procs_data+0x676)
> [0x2b352d00e0e6]
> [idgc3grid01:13006] [ 3]
> /home/oci/murri/sw/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0xb8)
> [0x2b352d015358]
> [idgc3grid01:13006] [ 4]
> /home/oci/murri/sw/lib/openmpi/mca_plm_rsh.so [0x2b352dcb9a80]
> [idgc3grid01:13006] [ 5] mpirun [0x40345a]
> [idgc3grid01:13006] [ 6] mpirun [0x402af3]
> [idgc3grid01:13006] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4)
> [0x359361d974]
> [idgc3grid01:13006] [ 8] mpirun [0x402a29]
> [idgc3grid01:13006] *** End of error message ***
> Segmentation fault
>
> I've already tried the suggestions posted to similar messages on the
> list: "ldd" reports that the executable is linked with the libraries
> in my home, not the system-wide OMPI::
>
> [murri_at_idgc3grid01 hello_mpi.d]$ ldd hello_mpi
> libmpi.so.0 => /home/oci/murri/sw/lib/libmpi.so.0 (0x00002ad2bd6f2000)
> libopen-rte.so.0 => /home/oci/murri/sw/lib/libopen-rte.so.0
> (0x00002ad2bd997000)
> libopen-pal.so.0 => /home/oci/murri/sw/lib/libopen-pal.so.0
> (0x00002ad2bdbe3000)
> libdl.so.2 => /lib64/libdl.so.2 (0x0000003593e00000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003596a00000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00000035a1000000)
> libm.so.6 => /lib64/libm.so.6 (0x0000003593a00000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003594200000)
> libc.so.6 => /lib64/libc.so.6 (0x0000003593600000)
> /lib64/ld-linux-x86-64.so.2 (0x0000003593200000)
>
> I've also checked with "strace" that the "mpi.h" file used during
> compile is the one in ~/sw/include and that all ".so" files being
> loaded from OMPI are the ones in ~/sw/lib. I can ssh without password
> to the target compute node. The "mpirun" and "mpicc" are the correct ones:
>
> [murri_at_idgc3grid01 hello_mpi.d]$ which mpirun
> ~/sw/bin/mpirun
>
> [murri_at_idgc3grid01 hello_mpi.d]$ which mpicc
> ~/sw/bin/mpicc
>
>
> I'm pretty stuck now; can anybody give me a hint?
>
> Thanks a lot for any help!
>
> Best regards,
> Riccardo
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users