Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Mpirun only works when n< 3
From: Randolph Pullen (randolph_pullen_at_[hidden])
Date: 2011-07-11 13:52:05


I have discovered slightly more information:When I replace node 'B' from the new cluster with node 'C' from the old clusterI get the similar behavior but with an error message:mpirun -H A,A,A,A,A,A,A  ring     (works from either node)
mpirun -H C,C,C  ring     (works from either node)
mpirun -H A,C  ring     (Fails from either node:)Process 0 sending 10 to 1, tag 201 (3 processes in ring)
[C:23465] ***  An error occurred in MPI_Recv
[C:23465] ***  on communicator MPI_COMM_WORLD[C:23465] ***  MPI_ERRORS_ARE FATAL (your job will now abort)Process 0 sent to 1----------------------------------Running this on either node A or C produces the same resultNode C runs openMPI 1.4.1 and is an ordinary dual core on FC10 , not an i5 2400 like the others.all the binaries are compiled on FC10 with gcc 4.3.2
--- On Tue, 12/7/11, Randolph Pullen <randolph_pullen_at_[hidden]> wrote:

From: Randolph Pullen <randolph_pullen_at_[hidden]>
Subject: Re: [OMPI users] Mpirun only works when n< 3
To: "Open MPI Users" <users_at_[hidden]>, "Jeff Squyres" <jsquyres_at_[hidden]>
Received: Tuesday, 12 July, 2011, 1:31 AM

There are no firewalls by default.  I can ssh between both nodes without a password so I assumed that all is good with the comms.I can also get both nodes to participate in the ring program at the same time.Its just that I am limited to inly 2 processes if they are split between the nodes
ie:mpirun -H A,B ring                         (works)mpirun -H A,A,A,A,A,A,A  ring     (works)mpirun -H B,B,B,B ring                 (works)mpirun -H A,B,A  ring                    (hangs)

--- On Tue, 12/7/11, Jeff Squyres <jsquyres_at_[hidden]> wrote:

From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] Mpirun only works when n< 3
To: randolph_pullen_at_[hidden], "Open MPI Users" <users_at_[hidden]>
Received: Tuesday, 12 July, 2011, 12:21 AM

Have you disabled firewalls between your compute nodes?

On Jul 11, 2011, at 9:34 AM, Randolph Pullen wrote:

> This appears to be similar to the problem described in:
>
> https://svn.open-mpi.org/trac/ompi/ticket/2043
>
> However, those fixes do not work for me.
>
> I am running on an
>
> - i5 sandy bridge under Ubuntu 10.10  8 G RAM
>
> - Kernel 2.6.32.14 with OpenVZ tweaks
>
> - OpenMPI V 1.4.1
>
> I am trying to migrate existing software to a new cluster (A,B)
>
> Symptoms:
>
> I can run the ring demo on a single machine, either A or B with any number of processes.
>
> But when I combine the 2 machines I am limited to 2 processes, any more and MPI hangs.   It gets as far as:
>
>       Process 0 sending 10 to 1, tag 201 (3 processes in ring)
>
>       Process 0 sent to 1
>
> and there it stays...
>
> Any help greatly appreciated.
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
-----Inline Attachment Follows-----
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users