Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Orion Poplawski (orion_at_[hidden])
Date: 2006-10-20 11:45:25


Reuti wrote:
> Hi,
>
> Am 20.10.2006 um 01:08 schrieb Orion Poplawski:
>
>> I'm starting to test out OpenMPI 1.2 tight integration with SGE and
>> have run into the following issue. Currently, my startmpi script
>> massages the hostnames in the machines file created from the SGE
>> pe_hostfile add an "x" suffix on machines that are connected with a
>> separate GigE network dedicated for MPI traffic.
>>
>> With tight integration, openmpi uses the SGE pe_hostfile directly, e.g.:
>>
>> coop00.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>> coop01.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>>
>> Now, how/can I modify this so that MPI traffic speaks to coop00x and
>> coop01x? One immediate problem that I'm running into is that the
>> startmpi script from the SGE PE runs as the user of the job so it
>> can't modify pe_hostfile.
>
> is the name of the pe_hostfile hardcoded, to point to the one in the
> nodes spool directory, or is OpenMPI using the $PE_HOSTFILE, which you
> could reset to a new name to point to a modified one? Another issue
> might be the back-channel of the communication, where sometimes simply
> the `hostname` of the sender is taken to answer.

(Sending this to the openmpi-devel list as well I see what insight they
may have. This seems like a common use case.)

It uses $PE_HOSTFILE, so I made a startup script that created a new
pe_hostfile. This requires something like the following in my job script:

setenv PE_HOSTFILE $TMPDIR/pe_hostfile
orterun -np $NSLOTS $*

which is unfortunate that it can't be handled automatically somehow.

First tried:

coop01x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
coop00x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>

Which yielded:

error: commlib error: access denied (client IP resolved to host name
"coop01x.cora.nwra.com". This is not identical to clients host name
"coop01.cora.nwra.com")
error: executing task of job 41354 failed: failed sending task to
execd_at_[hidden]: can't find connection
[coop01:27468] ERROR: A daemon on node coop00x.cora.nwra.com failed to
start as expected.
[coop01:27468] ERROR: There may be more information available from
[coop01:27468] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[coop01:27468] ERROR: If the problem persists, please restart the
[coop01:27468] ERROR: Grid Engine PE job
[coop01:27468] ERROR: The daemon exited unexpectedly with status 1.
error: commlib error: access denied (client IP resolved to host name
"coop01x.cora.nwra.com". This is not identical to clients host name
"coop01.cora.nwra.com")
error: executing task of job 41354 failed: failed sending task to
execd_at_[hidden]: can't find connection

Then:

coop01x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
coop00x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>

which yields:

error: commlib error: access denied (client IP resolved to host name
"coop01x.cora.nwra.com". This is not identical to clients host name
"coop01.cora.nwra.com")
error: executing task of job 41356 failed: failed sending task to
execd_at_[hidden]: can't find connection
error: commlib error: access denied (client IP resolved to host name
"coop01x.cora.nwra.com". This is not identical to clients host name
"coop01.cora.nwra.com")
[coop01:27945] ERROR: A daemon on node coop01x.cora.nwra.com failed to
start as expected.
[coop01:27945] ERROR: There may be more information available from
[coop01:27945] ERROR: the 'qstat -t' command on the Grid Engine tasks.
[coop01:27945] ERROR: If the problem persists, please restart the
[coop01:27945] ERROR: Grid Engine PE job
[coop01:27945] ERROR: The daemon exited unexpectedly with status 1.
error: executing task of job 41356 failed: failed sending task to
execd_at_[hidden]: can't find connection

Now, looking at the OpenMPI gridengine code, it looks like it gets the
node name from the first entry in the pe_hostfile, and never really uses
the queue name for anything.

         ptr = strtok_r(buf, " \n", &tok);
         num = strtok_r(NULL, " \n", &tok);
         queue = strtok_r(NULL, " \n", &tok);
         arch = strtok_r(NULL, " \n", &tok);
...
         node->node_name = strdup(ptr);
         node->node_arch = strdup(arch);

Perhaps it can be modified it uses the queue name hostname when doing
SGE/qrsh calls, but the first hostname when doing MPI communication.
Not really sure what the intent of the two fields in SGE's pe_hostfile
is, or if OpenMPI can handle the idea of two hostnames for different
purposes.

-- 
Orion Poplawski
System Administrator                  303-415-9701 x222
NWRA/CoRA Division                    FAX: 303-415-9702
3380 Mitchell Lane                  orion_at_[hidden]
Boulder, CO 80301              http://www.cora.nwra.com