Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?
From: Paul Kapinos (kapinos_at_[hidden])
Date: 2011-07-20 07:52:35

Hi Terry, Reuti,

good news: we've solved/workarounded the problem with CT/8.2.1c :o)

the "fix" was easy: we used the 64bit version of the 'mpiexec' instead
of [previously-used as default] 32bit version. The 64bit version version
works now with both NIS and LDAP autentification modi. The32bit version
works with the NIS-autentificated part of our cluster, only.

Thanks for your help!

Best wishes
Paul Kapinos

Reuti wrote:
> Hi,
> Am 15.07.2011 um 21:14 schrieb Terry Dontje:
>> On 7/15/2011 1:46 PM, Paul Kapinos wrote:
>>> Hi OpenMPI volks (and Oracle/Sun experts),
>>> we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our cluster. In the part of the cluster where LDAP is activated, the mpiexec does not try to spawn tasks on remote nodes at all, but exits with an error message alike below. If 'strace -f' the mpiexec, no exec of "ssh" can be found at all. Wondering, mpiexec tries to look into /etc/passwd (where user is not in, because using LDAP!).
>> Note this is an area that should be no different than from stock Open MPI.

"should not" but it is :o)
However, I compare CT/8.2.1c with self-compiled OpenMPI/1.4.3 which are
far different releases. And they behave definitely in different way: in
selv-compiled OpenMPI both 32bit and 64bit mpiexecs work with NIS and
with LDAP, and the CT/8.2.1c mpiexec in 32bit does work with NIS only.

>> I would suspect that the message might be coming from ssh. I wouldn't suspect mpiexec would be looking into /etc/passwd at all, why would it need to.
> the output you listed is titled "[unknown-user]". Maybe referring to the password file is a wrong simplification. The test is also on the master node of the parallel job by an usual `getpwuid`. The /etc/nsswitch.conf is fine an the `mpiexec` machine?
> On this node the user is known too? Can they login because they have no passphrase or because they have an agent running, or did you setup hostbased authentication?

my user is known on each node and is allowed to log in (without
password) from any to any node. In /etc/passwd there is no password for
my user; all auth thins are done by NIS or LDAP. (sorry I cannot tell
more because this is admin stuff, but as said: "ssh" works from any to
any node without password).
/etc/nsswitch.conf seem to be fine (it works now with the 64bit version
of mpiexec :o)

>> It should just be using ssh. Can you manually ssh to the same node?
>>> On the old part of the cluster, where NIS is used as the autentification method, Sun MPI runs very fine.
>>> So, is Suns MPI compatible with LDAP autotentification method at all?
>> In as far as whatever launcher you use is compatible with LDAP.
>>> Best wishes,
>>> Paul
>>> P.S. in both parts if the cluster, me (login marked as xxxxx here) can login to any node by ssh without need to type the password.
>>From the headnode of the cluster to a node or also between nodes?
> -- Reuti
>>> --------------------------------------------------------------------------
>>> The user (xxxxx) is unknown to the system (i.e. there is no corresponding
>>> entry in the password file). Please contact your system administrator
>>> for a fix.
>>> --------------------------------------------------------------------------
>>> [cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: Fatal in file plm_rsh_module.c at line 1058
>>> --------------------------------------------------------------------------

Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915