Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Cluster : received unexpected process identifier
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-04-04 20:11:34


On Apr 4, 2012, at 8:04 PM, Rohan Deshpande wrote:

> Yes they are on same subnet. ips for example - 192.168.1.1, 192.168.1.2, 192.168.1.3

This is generally considered a bad idea, not just for MPI, but for Linux in general. Google around about this. One reason, for example, is that there is no way to guarantee which IP interface traffic will actually be sent out. For example, if you open a socket to a peer IP address (e.g., 192.168.1.10), which IP address will be used to create that socket -- .1, .2, or .3? There's no way to know.

(this is actually exactly the scenario that OMPI was complaining about; it got a socket from an unexpected IP address, and therefore got confused and basically said, "hey human, go figure this out")

You need to put your IP interfaces on different IP subnets. E.g., have eth0 on 192.168.1.x/24, eth1 on 192.168.2.x/24, and eth2 on 192.168.3.x/24. It depends on how your networks are configured and what hardware you have -- you can implement this with switch-based VLANs (e.g., the ports that the 1.x wires go into are hard-wired to VLAN 10, the ports that the 2.x wires go into are hard-wired to VLAN 20, etc.), or using multiple switches (e.g., each 1.x wire goes to switch A, each 2.x wire goes to switch B, etc.).

Make sense?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/