Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI hangs on multiple nodes
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-09-20 08:11:34


On Sep 19, 2011, at 10:23 PM, Ole Nielsen wrote:

> Hi all - and sorry for the multiple postings, but I have more information.

+1 on Eugene's comments. The test program looks fine to me.

FWIW, you don't need -lmpi to compile your program; OMPI's wrapper compiler allows you to just:

    mpicc mpi_test.c -o mpi_test -Wall

> 1: After a reboot of two nodes I ran again, and the inter-node freeze didn't happen until the third iteration. I take that to mean that the basic communication works, but that something is saturating. Is there some notion of buffer size somewhere in the MPI system that could explain this?

Hmm. This is not a good sign; it somewhat indicates a problem with your OS. Based on this email and your prior emails, I'm guessing you're using TCP for communication, and that the problem is based on inter-node communication (e.g., the problem would occur even if you only run 1 process per machine, but does not occur if you run all N processes on a single machine, per your #4, below).

> 2: The nodes have 4 ethernet cards each. Could the mapping be a problem?

Shouldn't be. If it runs at all, then it should run fine.

Do you have all your ethernet cards on a single subnet, or multiple subnets? I have heard of problems when you have multiple ethernet cards on the same subnet -- I believe there's some non-determinism in than case in what wire/NIC a packet will actually go out, which may be problematic for OMPI.

> 3: The cpus are running at a 100% for all processes involved in the freeze

That's probably right. OMPI aggressively polls for progress as a way to decrease latency. So all processes are trying to make progress, and therefore are aggressively polling, eating up 100% of the CPU.

> 4: The same test program (http://code.google.com/p/pypar/source/browse/source/mpi_test.c) works fine when run within one node so the problem must be with MPI and/or our network.

This helps identify the issue as the TCP communication, not the shared memory communication.

> 5: The network and ssh works otherwise fine.

Good.

> Again many thanks for any hint that can get us going again. The main thing we need is some diagnostics that may point to what causes this problem for MPI.

If you are running with multiple NICs on the same subnet, change them to multiple subnets and see if it starts working fine.

If they're on different subnets, try using the btl_tcp_if_include / btl_tcp_if_exclude MCA parameters to exclude certain networks and see if they're the problematic ones. Keep in mind that ..._include and ..._exclude are mutually exclusive; you should only specify one. And if you specify exclude, be sure to exclude loopback. E.g:

  mpirun --mca btl_if_include eth0,eth1 -np 16 --hostfile hostfile mpi_test
or
  mpirun --mca btl_if_exclude lo0,eth1 -np 16 --hostfile hostfile mpi_test

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/