Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] how to set up the cluster of 5 nodes in openmpi
From: Gus Correa (gus_at_[hidden])
Date: 2009-09-30 11:31:28


Hi Ankur Pachauri

Besides what Jody already said.

1) Do these machines have Internet IP addresses
and, say, corresponding Ethernet interfaces that
you want to use for MPI?

2) Or are you planning to establish a private network
only for using OpenMPI?

I would suggest using #2.

Typically this can be done if you don't care about having
Internet connections on each machine, and will dedicate
"the" single Ethernet port on each machine to MPI,
or if you have two or more Ethernet ports on each machine,
so that you can use one of them only for MPI.
Many current motherboards come with two onboard Gigabit
Ethernet ports.
Check what you have.

On either case you need an Ethernet switch to connect the machines.
(Don't use an Ethernet hub, or the performance will be very poor.)

If you don't care much about high performance,
or if your budget is tight,
a cheap SOHO (unmanaged) type Ethernet switch will do.
I have a test/toy cluster here built with these switches
and it works fine, although it will never make it to Top500,
of course.
There are even 5-port and 8-port switches of this kind.

If you care about high performance you need a better switch.

You need to connect the "MPI" ports to the switch,
using Cat5e or Cat6 Ethernet cables.
For an unmanaged switch that is basically it.

***

You need to configure a private network (typically with
IP addresses like 192.168.1.1, 192.168.1.2, ...
192.168.1.5.
On Fedora this is done by editing a configuration file
called /etc/sysconfig/network-scripts/ifcfg-ethX (where X is
0 or 1, depending on the port you are using for MPI).
Check the RedHat user's guide for details (Fedora is very similar).

Here is one example:

$ cat /etc/sysconfig/network-scripts/ifcfg-eth1
# name of your Ethernet controller
DEVICE=eth1
TYPE=Ethernet
ONBOOT=yes
BOOTPROTO=none
NETWORK=192.168.1.0
NETMASK=255.255.255.0
IPADDR=192.168.1.1
HWADDR=ab:cd:ef:12:34:56

The items that vary on each node are the IPADDR (IP address)
and the HWADDRR (the MAC address of your Ethernet port dedicated
to MPI).

You need to create or extend a /etc/hosts file on *every* node,
with the name and IP address of all other nodes.
For instance:

$ cat /etc/hosts

... stuff that was already there (don't remove the loopback address!)

node01-mpi 192.168.1.1
node02-mpi 192.168.1.2
...
node05-mpi 192.168.1.5

***

NOTE: You need to use the *same* names above (actually associated to
the Ethernet port on the node, not only to the node itself)
in your OpenMPI nodes file (or use the same names in the
mpiexec command line).

***

Reboot the machines and check if the private subnet is working.
Use ping across all pairs of machines.
For instance, login to node01 and
do:

ping -I eth1 -R node02-mpi

The output should tell you if the "MPI" IP addresses
(and Ethernet ports) are being used.

Repeat the procedure on all pairs of nodes ( 4! = 24 pairs for you).

You need to establish passwordless connections across the node.
There are many ways to do this.

For a private subnet (like the 192.168.10 above)
a simple one is to generate a single RSA key (with ssh-keygen)
*without passphrase*.

Then copy the rsa-pub file to the /etc/ssh/ssh_known_hosts2
Edit the ssh_known_hosts2 file, repeat the key for as many
nodes you have (5 times in your case), and put the IP address
and MPI interface name of each node, something like this:

192.168.1.1,node01-mpi ssh-rsa Your-public-RSA-Key-goes-here
192.168.1.2,node02-mpi ssh-rsa Your-public-RSA-Key-goes-here
...
192.168.1.5,node05-mpi ssh-rsa Your-public-RSA-Key-goes-here

Note, it is the *same* RSA public key on all lines.

Check if passwordless ssh is working.
This can be done by logging in to each node and then
ssh to another node:

Say, from node01 try:

ssh node02-mpi

Repeat on all pairs of nodes.

***

Well, save some mistake in the details above,
this should work.

I hope this helps.

Gus Correa

jody wrote:
> Hi
> All of your questions are answered in the FAQ...
>
> If you have a TCP/IP connection between your machines so that each
> machine can reach every other one,
> that will be ok.
>
> First make sure you can get access from each machine to every other
> one using ssh without a password.
> See the FAQ:
> http://www.open-mpi.org/faq/?category=rsh
>
> Make sure to set PATH and LD_LIBRARY_PATH as described in the FAQ:
> http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path
>
> Next, make sure your application is accessible by all of your
> machines. I use an nfs directory shared by all my machines,
> and that is where i put the application.
>
> To start your application, follow the instructions in the FAQ:
> http://www.open-mpi.org/faq/?category=running
>
> If you want to use host files, read about how to use them in the FAQ:
> http://www.open-mpi.org/faq/?category=running#mpirun-host
>
> Hope that helps
>
> Jody
>
> On Wed, Sep 30, 2009 at 11:00 AM, ankur pachauri
> <ankurpachauri_at_[hidden]> wrote:
>> Dear all,
>>
>> I have been able to install open mpi on two independent machines having FC
>> 10. The simple hello world programms are running fine on the independent
>> machines....But can any one pls help me by letting me know how to connect
>> the two machines and run a common program between the two....how do we a do
>> a lamboot -v lamhosts in case of openmpi?
>> How do we get the open mpi running on the two computers simultaneously and
>> excute a common program on the two machines.
>>
>> Thanks in advance
>>
>>
>> On Wed, Sep 30, 2009 at 12:24 PM, jody <jody.xha_at_[hidden]> wrote:
>>> Hi
>>> Have look at the Open MPI FAQ:
>>>
>>> http://www.open-mpi.org/faq/
>>>
>>> It gives you all the information you need to start working with your
>>> cluster.
>>>
>>> Jody
>>>
>>>
>>> On Wed, Sep 30, 2009 at 8:25 AM, ankur pachauri <ankurpachauri_at_[hidden]>
>>> wrote:
>>>> dear all,
>>>>
>>>> i am new to openmpi, all that i need is to set up the cluster of around
>>>> 5
>>>> nodes in my lab, i am using fedora 7 in the lab. so i'll be thankfull to
>>>> you
>>>> if let me know the steps or the procedure to setup the cluster(as in
>>>> case of
>>>> lam/mpi- passwordless ssh or nfs mount and ...).
>>>>
>>>> regards,
>>>>
>>>> --
>>>> Ankur Pachauri.
>>>> 09927590910
>>>>
>>>> Research Scholar,
>>>> software engineering.
>>>> Department of Mathematics
>>>> Dayalbagh Educational Institute
>>>> Dayalbagh,
>>>> AGRA
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Ankur Pachauri.
>> 09927590910
>>
>> Research Scholar,
>> software engineering.
>> Department of Mathematics
>> Dayalbagh Educational Institute
>> Dayalbagh,
>> AGRA
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users