Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Network Problem?
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-07-04 08:56:26


Open MPI does not currently support NAT; sorry. :-(

On Jun 30, 2009, at 2:49 PM, David Ronis wrote:

> (This may be a duplicate. An earlier post seems to have been lost).
>
> I'm using openmpi (1.3.2) to run on 3 dual processor machines (running
> linux, slackware-12.1, gcc-4.4.0). Two are directly on my LAN while
> the 3rd is connected to my LAN via VPN and NAT (I can communicate in
> either direction from any of the machines to the remote machines using
> its NAT address).
>
> The program I'm trying to run is very simple in terms of MPI.
> Basically it is:
>
> main()
> {
> [snip];
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
>
> [snip];
>
> if(myrank==0)
> i=MPI_Reduce(MPI_IN_PLACE, C, N, MPI_DOUBLE,
> MPI_SUM, 0, MPI_COMM_WORLD);
> else
> i=MPI_Reduce(C, MPI_IN_PLACE, N, MPI_DOUBLE,
> MPI_SUM, 0, MPI_COMM_WORLD);
>
> if(i!=MPI_SUCCESS)
> {
>
> fprintf(stderr,"MPI_Reduce (C) fails on processor %d\n",
> myrank);
> MPI_Finalize();
> exit(1);
> }
> MPI_Barrier(MPI_COMM_WORLD);
>
>
> [snip];
>
> }
>
> I run by invoking:
>
> mpirun -v -np ${NPROC} -hostfile ${HOSTFILE} --stdin none $*
> > /dev/null
>
> If I run on the 4 nodes that are physically on the LAN it works as
> expected. When I add the nodes on the remote machine things don't
> work properly:
>
> 1. If I start with NPROC=6 on one of the LAN machines all 6 nodes
> start (as shown by running ps), and all get to the MPI_HARVEST
> calls. At that point things hang (I see no network traffic, which
> given the size of the array I'm trying to reduce is strange).
>
> 2. If I start on the remote with NPROC=6, the only the mpirun call
> shows up under ps on the remote, while nothing shows up on the other
> nodes. Killing the process gives messages like:
>
> hostname - daemon did not report back when launched
>
> 3. If I start on the remote with NPROC=2, the 2 processes start on
> the remote and finish properly.
>
> My suspicion is that there's some bad interaction with NAT and
> authentication.
>
> Any suggestions?
>
> David
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Jeff Squyres
Cisco Systems