Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Pak Lui (Pak.Lui_at_[hidden])
Date: 2006-10-20 14:13:30


Hi Orion and Reuti,

Let me see if I can understand the issue by breaking them down first:

(1) First, I am curious to know why you would need to create a
PE_HOSTFILE yourself, because that file is generated by SGE/N1GE when
you specify you are running a parallel job under SGE/N1GE, by doing
something like this with qsub/qsh/qrsh, etc:

% qsub -pe name_of_my_pe 4

Normally I wouldn't expect users who run a parallel job would need to
create or modify that file though, and to manually set any environment
variables.

You can also find am example of a simple use case of N1GE/SGE with
OMPI/ORTE here:

http://www.open-mpi.org/faq/?category=running#run-n1ge-or-sge

===

(2) As for the following error message:

> error: commlib error: access denied (client IP resolved to host name
> "coop01x.cora.nwra.com". This is not identical to clients host name
> "coop01.cora.nwra.com")

As you mentioned in your setup, each node has 2 interfaces. And this
message is an SGE error and it seems to tell you that SGE cannot resolve
the host name.

Could you check if you run this SGE gethostbyname script to see if the
two different hostnames can be resolved to the same host?

# Check that SGE can resolve all the hostnames correctly:

# cd /gridware/sge/utilbin/solaris64
# ./gethostbyname -aname sun-1
sun-1-grid
# ./gethostbyname -aname sun-1-grid
sun-1-grid
#

If this doesn't work, you may need to follow the procedure in the
following location to tell SGE about multiple interfaces by creating a
host_aliases in $SGE_ROOT/$SGE_CELL/common directory, as described in
this sunsource.net document below.

http://gridengine.sunsource.net/howto/multi_intrfcs.html

===

(3) The integration would use the hostnames in the PE_HOSTFILE for
launching the grid engine tasks. You can see the actual qrsh command
that is used to launch the tasks by setting the mca parameter:

-mca pls_gridengine_debug 1

===

(4) As for what you have mentioned here:

> Now, looking at the OpenMPI gridengine code, it looks like it gets the
> node name from the first entry in the pe_hostfile, and never really uses
> the queue name for anything.
>
> ptr = strtok_r(buf, " \n", &tok);
> num = strtok_r(NULL, " \n", &tok);
> queue = strtok_r(NULL, " \n", &tok);
> arch = strtok_r(NULL, " \n", &tok);
> ...
> node->node_name = strdup(ptr);
> node->node_arch = strdup(arch);
>
> Perhaps it can be modified it uses the queue name hostname when doing
> SGE/qrsh calls, but the first hostname when doing MPI communication.
> Not really sure what the intent of the two fields in SGE's pe_hostfile
> is, or if OpenMPI can handle the idea of two hostnames for different
> purposes.
>

Once it is in a parallel environment of SGE (e.g. when you have started
a parallel job with "qsh/qsub/qrsh -pe name_of_pe"), in ORTE would use
the -inherit flag of qrsh to tell qrsh to start a task in a already
scheduled parallel job, therefore we cannot assign another queue to the
job, because SGE wouldn't allow us to do that, and I don't believe is a
right thing to do, as N1GE/SGE would return an error like this:

% /opt/sge/bin/sol-sparc64/qrsh -q new2.q -inherit -V node1 sleep 10
error: Unknown option -q

===

I am thinking that the #(2) is the reason why you are running into the
error. But let me know if that works for you.

Orion Poplawski wrote:
> Reuti wrote:
>> Hi,
>>
>> Am 20.10.2006 um 01:08 schrieb Orion Poplawski:
>>
>>> I'm starting to test out OpenMPI 1.2 tight integration with SGE and
>>> have run into the following issue. Currently, my startmpi script
>>> massages the hostnames in the machines file created from the SGE
>>> pe_hostfile add an "x" suffix on machines that are connected with a
>>> separate GigE network dedicated for MPI traffic.
>>>
>>> With tight integration, openmpi uses the SGE pe_hostfile directly, e.g.:
>>>
>>> coop00.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>>> coop01.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>>>
>>> Now, how/can I modify this so that MPI traffic speaks to coop00x and
>>> coop01x? One immediate problem that I'm running into is that the
>>> startmpi script from the SGE PE runs as the user of the job so it
>>> can't modify pe_hostfile.
>> is the name of the pe_hostfile hardcoded, to point to the one in the
>> nodes spool directory, or is OpenMPI using the $PE_HOSTFILE, which you
>> could reset to a new name to point to a modified one? Another issue
>> might be the back-channel of the communication, where sometimes simply
>> the `hostname` of the sender is taken to answer.
>
> (Sending this to the openmpi-devel list as well I see what insight they
> may have. This seems like a common use case.)
>
> It uses $PE_HOSTFILE, so I made a startup script that created a new
> pe_hostfile. This requires something like the following in my job script:
>
> setenv PE_HOSTFILE $TMPDIR/pe_hostfile
> orterun -np $NSLOTS $*
>
> which is unfortunate that it can't be handled automatically somehow.
>
> First tried:
>
> coop01x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
> coop00x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>
> Which yielded:
>
> error: commlib error: access denied (client IP resolved to host name
> "coop01x.cora.nwra.com". This is not identical to clients host name
> "coop01.cora.nwra.com")
> error: executing task of job 41354 failed: failed sending task to
> execd_at_[hidden]: can't find connection
> [coop01:27468] ERROR: A daemon on node coop00x.cora.nwra.com failed to
> start as expected.
> [coop01:27468] ERROR: There may be more information available from
> [coop01:27468] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [coop01:27468] ERROR: If the problem persists, please restart the
> [coop01:27468] ERROR: Grid Engine PE job
> [coop01:27468] ERROR: The daemon exited unexpectedly with status 1.
> error: commlib error: access denied (client IP resolved to host name
> "coop01x.cora.nwra.com". This is not identical to clients host name
> "coop01.cora.nwra.com")
> error: executing task of job 41354 failed: failed sending task to
> execd_at_[hidden]: can't find connection
>
> Then:
>
> coop01x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
> coop00x.cora.nwra.com 2 coop.q_at_[hidden] <NULL>
>
> which yields:
>
> error: commlib error: access denied (client IP resolved to host name
> "coop01x.cora.nwra.com". This is not identical to clients host name
> "coop01.cora.nwra.com")
> error: executing task of job 41356 failed: failed sending task to
> execd_at_[hidden]: can't find connection
> error: commlib error: access denied (client IP resolved to host name
> "coop01x.cora.nwra.com". This is not identical to clients host name
> "coop01.cora.nwra.com")
> [coop01:27945] ERROR: A daemon on node coop01x.cora.nwra.com failed to
> start as expected.
> [coop01:27945] ERROR: There may be more information available from
> [coop01:27945] ERROR: the 'qstat -t' command on the Grid Engine tasks.
> [coop01:27945] ERROR: If the problem persists, please restart the
> [coop01:27945] ERROR: Grid Engine PE job
> [coop01:27945] ERROR: The daemon exited unexpectedly with status 1.
> error: executing task of job 41356 failed: failed sending task to
> execd_at_[hidden]: can't find connection
>
>
> Now, looking at the OpenMPI gridengine code, it looks like it gets the
> node name from the first entry in the pe_hostfile, and never really uses
> the queue name for anything.
>
> ptr = strtok_r(buf, " \n", &tok);
> num = strtok_r(NULL, " \n", &tok);
> queue = strtok_r(NULL, " \n", &tok);
> arch = strtok_r(NULL, " \n", &tok);
> ...
> node->node_name = strdup(ptr);
> node->node_arch = strdup(arch);
>
> Perhaps it can be modified it uses the queue name hostname when doing
> SGE/qrsh calls, but the first hostname when doing MPI communication.
> Not really sure what the intent of the two fields in SGE's pe_hostfile
> is, or if OpenMPI can handle the idea of two hostnames for different
> purposes.
>

-- 
Thanks,
- Pak Lui
pak.lui_at_[hidden]