Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Communications problems w/OpenMPI
From: deadchicken_at_[hidden]
Date: 2008-12-18 03:15:13


I've been trying to get OpenMPI to work on Amazon's EC2 but I've been
running into a communications problem. Here is the source (typical
Hello, World):

> #include <stdio.h>
> #include "mpi.h"
>
> int main(argc,argv)
> int argc;
> char *argv[];
> {
> int myid, numprocs;
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD,&myid);
>
> printf ("%d of %d: Hello world!\n", myid, numprocs);
>
> MPI_Finalize();
> return 0;
> }

After compiling it, I copied it over to the other machine and tried
running it with:

mpirun -v --mca btl self,tcp -np 4 --machinefile machines /mnt/mpihw

which produces:

--------------------------------------------------------------------------
Process 0.1.1 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
--------------------------------------------------------------------------
Process 0.1.3 is unable to reach 0.1.0 for MPI communication.
If you specified the use of a BTL component, you may have
forgotten a component (such as "self") in the list of
usable components.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

   PML add procs failed
   --> Returned "Unreachable" (-12) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,0] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
[domU-12-31-39-02-F5-13:03965] [0,0,0]-[0,1,2] mca_oob_tcp_msg_recv:
readv failed: Connection reset by peer (104)
mpirun noticed that job rank 0 with PID 3653 on node
domU-12-31-39-00-B2-23 exited on signal 15 (Terminated).
1 additional process aborted (not shown)

AFAIK, the machines are able to communicate with each other on any port
you like, just not with MPI. Any idea what's wrong?