Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with running openMPI program
From: Ankush Kaul (ankush.rkaul_at_[hidden])
Date: 2009-04-18 04:59:56


Let me explain in detail,

when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node
(192.168.45.65)
my openmpi-default-hostfile looked like*
192.168.67.18 slots=2
192.168.45.65 slots=2*

after this on running the command *miprun /work/Pi* on master node we got
*
# root_at_192.168.45.65 password :*

after entering the password the program ran on both de nodes.

Now after connecting a second compute node, and editing the hostfile:

*192.168.67.18 slots=2
192.168.45.65 slots=2*
*192.168.67.241 slots=2

*and then running the command *miprun /work/Pi* on master node we got

# root_at_192.168.45.65's password: root_at_192.168.67.241's password:

which does not accept the password.

Although we are trying to implement the passwordless cluster. i wud like to
know what this problem is occuring?

On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <gus_at_[hidden]> wrote:

> Ankush
>
> You need to setup passwordless connections with ssh to the node you just
> added. You (or somebody else) probably did this already on the first
> compute node, otherwise the MPI programs wouldn't run
> across the network.
>
> See the very last sentence on this FAQ:
>
> http://www.open-mpi.org/faq/?category=running#run-prereqs
>
> And try this recipe (if you use RSA keys instead of DSA, replace all "dsa"
> by "rsa"):
>
>
> http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3
>
> I hope this helps.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> Ankush Kaul wrote:
>
>> Thank you, i m reading up on de tools u suggested.
>>
>> I am facing another problem, my cluster is working fine with 2 hosts (1
>> master + 1 compute node) but when i tried 2 add another node (1 master + 2
>> compute node) its not working. it works fine when i give de command mpirun
>> -host <hostname> /work/Pi
>>
>> but when i try to run
>> mpirun /work/Pi it gives following error:
>>
>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>> root_at_192.168.67.241 <mailto:root_at_192.168.67.241>'s password:
>>
>> Permission denied, please try again. <The password i provide is correct>
>>
>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>>
>> Permission denied, please try again.
>>
>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>>
>> Permission denied (publickey,gssapi-with-mic,password).
>>
>>
>> Permission denied, please try again.
>>
>> root_at_192.168.67.241 <mailto:root_at_192.168.67.241>'s password:
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1166
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> errmgr_hnp.c at line 90
>>
>> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to
>> start as expected.
>>
>> [ccomp1.cluster:03503] ERROR: There may be more information available from
>>
>> [ccomp1.cluster:03503] ERROR: the remote shell (see above).
>>
>> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status
>> 255.
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1198
>>
>>
>> What is the problem here?
>>
>> --------------------------------------------------------------------------
>>
>> mpirun was unable to cleanly terminate the daemons for this job. Returned
>> value Timeout instead of ORTE_SUCCESS
>>
>>
>> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <Eugene.Loh_at_[hidden] <mailto:
>> Eugene.Loh_at_[hidden]>> wrote:
>>
>> Ankush Kaul wrote:
>>
>> Finally, after mentioning the hostfiles the cluster is working
>> fine. We downloaded few benchmarking softwares but i would like
>> to know if there is any GUI based benchmarking software so that
>> its easier to demonstrate the working of our cluster while
>> displaying our cluster.
>>
>>
>> I'm confused what you're looking for here, but thought I'd venture a
>> suggestion.
>>
>> There are GUI-based performance analysis and tracing tools. E.g.,
>> run a program, [[semi-]automatically] collect performance data, run
>> a GUI-based analysis tool on the data, visualize what happened on
>> your cluster. Would this suit your purposes?
>>
>> If so, there are a variety of tools out there you could try. Some
>> are platform-specific or cost money. Some are widely/freely
>> available. Examples of these tools include Intel Trace Analyzer,
>> Jumpshot, Vampir, TAU, etc. I do know that Sun Studio (Performance
>> Analyzer) is available via free download on x86 and SPARC and Linux
>> and Solaris and works with OMPI. Possibly the same with Jumpshot.
>> VampirTrace instrumentation is already in OMPI, but then you need
>> to figure out the analysis-tool part. (I think the Vampir GUI tool
>> requires a license, but I'm not sure. Maybe you can convert to TAU,
>> which is probably available for free download.)
>>
>> Anyhow, I don't even know if that sort of thing fits your
>> requirements. Just an idea.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden] <mailto:users_at_[hidden]>
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>