Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] a problem about mpirun and SSH when using Open MPI 1.7rc8
From: Reuti (reuti_at_[hidden])
Date: 2013-03-14 05:13:25


Hi,

Am 14.03.2013 um 09:20 schrieb yumenlj:

> Hi, all
>
> I encountered a problem about mpirun and SSH when using Open MPI 1.7rc8.
>
> I have a 4-node cluster. This is the hostfile:
>
> [mpiuser_at_testnode11 openmpi-1.6.4]$ cat ~/work/hostfile
> testnode11
> testnode12
> testnode13
> testnode14
>
> I had configured SSH, copying ".ssh/id_rsa.pub" on testnode11 to ".ssh/authorized_keys" on all the 4 nodes.
> So that I can login all the 4 nodes from testnode11 without a password.
>
> The following test worked well with Open MPI 1.6.4.
>
> [mpiuser_at_testnode11 openmpi-1.6.4]$ mpirun -hostfile ~/work/hostfile -np 8 ~/src/openmpi-1.6.4/examples/ring_c
> Process 0 sending 10 to 1, tag 201 (8 processes in ring)
> Process 0 sent to 1
> Process 0 decremented value: 9
> Process 0 decremented value: 8
> Process 0 decremented value: 7
> Process 0 decremented value: 6
> Process 0 decremented value: 5
> Process 0 decremented value: 4
> Process 0 decremented value: 3
> Process 0 decremented value: 2
> Process 0 decremented value: 1
> Process 0 decremented value: 0
> Process 0 exiting
> Process 4 exiting
> Process 2 exiting
> Process 3 exiting
> Process 1 exiting
> Process 6 exiting
> Process 7 exiting
> Process 5 exiting
>
> However, when I switched to Open MPI 1.7rc8, the same test did not work.
>
> [mpiuser_at_testnode11 openmpi-1.7rc8]$ mpirun -hostfile ~/work/hostfile -np 8 ~/src/openmpi-1.7rc8/examples/ring_c
> Permission denied, please try again.
> Permission denied, please try again.
> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
> [testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 362
> [testnode12:06990] [[50636,0],1] attempted to send to [[50636,0],3]: tag 15
> [testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/grpcomm_base_xcast.c at line 166
>
> I had checked the logs of SSH, and found the direct reason. A SSH request from testnode12 to testnode14 was denied.
>
> [mpiuser_at_testnode11 openmpi-1.7rc8]$ ssh root_at_testnode14 tail -f /var/log/secure
> ...
> Mar 14 15:39:01 testnode14 sshd[31610]: Connection closed by testnode12
> Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from testnode12 port 55964 ssh2
> Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from testnode12 port 55964 ssh2
> Mar 14 15:39:04 testnode14 sshd[31612]: Connection closed by testnode12
> ...
>
> So I am puzzled. I launched mpirun on testnode11, but I do not know why testnode12 would send a SSH request to testnode14.
> One solution is to copy ".ssh/id_rsa.pub" on all the nodes to ".ssh/authorized_keys"

If all nodes have their own private key without a passphrase set this would work. OTOH copying the private key of testnode11 to all other nodes should also do.

> on all the nodes, but that is not what I want.
> Is there any way to control that all the SSH requests are sent from the node where mpirun executed, to all the nodes?
> I had checked all the orte parameters, and no answer found. Please give some suggestions.

Depending on the amount of nodes and in case you don't like passphrase-less ssh-keys at all like I do: setting up hostbased authentication could help:

http://arc.liv.ac.uk/SGE/howto/hostbased-ssh.html

-- Reuti

> Thanks!
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users