Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Orion Poplawski (orion_at_[hidden])
Date: 2006-11-02 15:38:35


Pak Lui wrote:
> Orion Poplawski wrote:
>>
>>
>> In our setup (which I don't believe is very unique) the nodes are
>> connected by two networks: an "admin" network which allows for
>> connections from outside the cluster and an "MPI" network that is a
>> private GigE network connecting the nodes for MPI traffic:
>>
>> +---------admin net (192.168.0.X)--------+
>> | | |
>> +-----------+ +--------+ +--------+
>> | SGE Master| | coop00 | | coop01 |
>> | | | coop00x| | coop01x|
>> +-----------+ +--------+ +--------+
>> | |
>> +------------+
>>
>> MPI net (192.168.1.X)
>>
>> So the "x" suffix names are the addresses on the MPI network.
>>
>> Currently (loose integration), we create machines files like:
>>
>> coop00x.cora.nwra.com cpu=2
>> coop01x.cora.nwra.com cpu=2
>>
>> which makes the MPI traffic travel over the MPI network. I'm trying
>> to duplicate this under "tight" integration.

Well, this is what we did with LAM and I naively assumed that since
OpenMPI used that same machines file format, it worked the same there.
But once I finally read the FAQ (specifically:
<http://www.open-mpi.org/faq/?category=tcp#tcp-selection>) I see that it
works totally differently.

So, default setup with gridengine integration works, and I just have:

btl_tcp_if_include = eth1

in my /etc/openmpi-mca-params.conf file.

Sorry for all the confusion.

-- 
Orion Poplawski
System Administrator                  303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion_at_[hidden]
Boulder, CO 80301              http://www.cora.nwra.com