Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] a problem about mpirun and SSH when using Open MPI 1.7rc8
From: yumenlj (yumenlj_at_[hidden])
Date: 2013-03-14 04:20:48


Hi, all

I encountered a problem about mpirun and SSH when using Open MPI 1.7rc8.

I have a 4-node cluster. This is the hostfile:

[mpiuser_at_testnode11 openmpi-1.6.4]$ cat ~/work/hostfile
testnode11
testnode12
testnode13
testnode14

I had configured SSH, copying ".ssh/id_rsa.pub" on testnode11 to ".ssh/authorized_keys" on all the 4 nodes.
So that I can login all the 4 nodes from testnode11 without a password.

The following test worked well with Open MPI 1.6.4.

[mpiuser_at_testnode11 openmpi-1.6.4]$ mpirun -hostfile ~/work/hostfile -np 8 ~/src/openmpi-1.6.4/examples/ring_c
Process 0 sending 10 to 1, tag 201 (8 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
Process 0 decremented value: 7
Process 0 decremented value: 6
Process 0 decremented value: 5
Process 0 decremented value: 4
Process 0 decremented value: 3
Process 0 decremented value: 2
Process 0 decremented value: 1
Process 0 decremented value: 0
Process 0 exiting
Process 4 exiting
Process 2 exiting
Process 3 exiting
Process 1 exiting
Process 6 exiting
Process 7 exiting
Process 5 exiting

However, when I switched to Open MPI 1.7rc8, the same test did not work.

[mpiuser_at_testnode11 openmpi-1.7rc8]$ mpirun -hostfile ~/work/hostfile -np 8 ~/src/openmpi-1.7rc8/examples/ring_c
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 362
[testnode12:06990] [[50636,0],1] attempted to send to [[50636,0],3]: tag 15
[testnode12:06990] [[50636,0],1] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/grpcomm_base_xcast.c at line 166

I had checked the logs of SSH, and found the direct reason. A SSH request from testnode12 to testnode14 was denied.

[mpiuser_at_testnode11 openmpi-1.7rc8]$ ssh root_at_testnode14 tail -f /var/log/secure
...
Mar 14 15:39:01 testnode14 sshd[31610]: Connection closed by testnode12
Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from testnode12 port 55964 ssh2
Mar 14 15:39:04 testnode14 sshd[31611]: Failed password for mpiuser from testnode12 port 55964 ssh2
Mar 14 15:39:04 testnode14 sshd[31612]: Connection closed by testnode12
...

So I am puzzled. I launched mpirun on testnode11, but I do not know why testnode12 would send a SSH request to testnode14.
One solution is to copy ".ssh/id_rsa.pub" on all the nodes to ".ssh/authorized_keys" on all the nodes, but that is not what I want.
Is there any way to control that all the SSH requests are sent from the node where mpirun executed, to all the nodes?
I had checked all the orte parameters, and no answer found. Please give some suggestions.

Thanks!