Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Segmentation fault / Address not mapped (1) with 2-node job on Rocks 5.2
From: Riccardo Murri (riccardo.murri_at_[hidden])
Date: 2010-06-21 19:27:48


I'm using OpenMPI 1.4.2 on a Rocks 5.2 cluster. I compiled it on my
own to have a thread-enabled MPI (the OMPI coming with Rocks 5.2
apparently only supports MPI_THREAD_SINGLE), and installed into ~/sw.

To test the newly installed library I compiled a simple "hello world"
that comes with Rocks::

  [murri_at_idgc3grid01 hello_mpi.d]$ cat hello_mpi.c
  #include <stdio.h>
  #include <sys/utsname.h>

  #include <mpi.h>

  int main(int argc, char **argv) {
    int myrank;
    struct utsname unam;

    MPI_Init(&argc, &argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
    printf("Hello from rank %d on host %s\n", myrank, unam.nodename);


The program runs fine as long as it only uses ranks on localhost::

  [murri_at_idgc3grid01 hello_mpi.d]$ mpirun --host localhost -np 2 hello_mpi
  Hello from rank 1 on host
  Hello from rank 0 on host

However, as soon as I try to run on more than one host, I get a

  [murri_at_idgc3grid01 hello_mpi.d]$ mpirun --host
idgc3grid01,compute-0-11 --pernode hello_mpi
  [idgc3grid01:13006] *** Process received signal ***
  [idgc3grid01:13006] Signal: Segmentation fault (11)
  [idgc3grid01:13006] Signal code: Address not mapped (1)
  [idgc3grid01:13006] Failing at address: 0x50
  [idgc3grid01:13006] [ 0] /lib64/ [0x359420e4c0]
  [idgc3grid01:13006] [ 1]
  [idgc3grid01:13006] [ 2]
  [idgc3grid01:13006] [ 3]
  [idgc3grid01:13006] [ 4]
/home/oci/murri/sw/lib/openmpi/ [0x2b352dcb9a80]
  [idgc3grid01:13006] [ 5] mpirun [0x40345a]
  [idgc3grid01:13006] [ 6] mpirun [0x402af3]
  [idgc3grid01:13006] [ 7] /lib64/
  [idgc3grid01:13006] [ 8] mpirun [0x402a29]
  [idgc3grid01:13006] *** End of error message ***
  Segmentation fault

I've already tried the suggestions posted to similar messages on the
list: "ldd" reports that the executable is linked with the libraries
in my home, not the system-wide OMPI::

  [murri_at_idgc3grid01 hello_mpi.d]$ ldd hello_mpi
 => /home/oci/murri/sw/lib/ (0x00002ad2bd6f2000)
 => /home/oci/murri/sw/lib/
 => /home/oci/murri/sw/lib/
 => /lib64/ (0x0000003593e00000)
 => /lib64/ (0x0000003596a00000)
 => /lib64/ (0x00000035a1000000)
 => /lib64/ (0x0000003593a00000)
 => /lib64/ (0x0000003594200000)
 => /lib64/ (0x0000003593600000)
          /lib64/ (0x0000003593200000)

I've also checked with "strace" that the "mpi.h" file used during
compile is the one in ~/sw/include and that all ".so" files being
loaded from OMPI are the ones in ~/sw/lib. I can ssh without password
to the target compute node. The "mpirun" and "mpicc" are the correct ones:

  [murri_at_idgc3grid01 hello_mpi.d]$ which mpirun

  [murri_at_idgc3grid01 hello_mpi.d]$ which mpicc

I'm pretty stuck now; can anybody give me a hint?

Thanks a lot for any help!

Best regards,