Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with running openMPI program
From: Ankush Kaul (ankush.rkaul_at_[hidden])
Date: 2009-04-19 06:26:13


Also how can i find out where are my mpi libraries and include directories?

On Sat, Apr 18, 2009 at 2:29 PM, Ankush Kaul <ankush.rkaul_at_[hidden]> wrote:

> Let me explain in detail,
>
> when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node
> (192.168.45.65)
> my openmpi-default-hostfile looked like*
> 192.168.67.18 slots=2
> 192.168.45.65 slots=2*
>
> after this on running the command *miprun /work/Pi* on master node we got
> *
> # root_at_192.168.45.65 password :*
>
> after entering the password the program ran on both de nodes.
>
> Now after connecting a second compute node, and editing the hostfile:
>
> *192.168.67.18 slots=2
> 192.168.45.65 slots=2*
> *192.168.67.241 slots=2
>
> *and then running the command *miprun /work/Pi* on master node we got
>
> # root_at_192.168.45.65's password: root_at_192.168.67.241's password:
>
> which does not accept the password.
>
> Although we are trying to implement the passwordless cluster. i wud like to
> know what this problem is occuring?
>
>
> On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <gus_at_[hidden]> wrote:
>
>> Ankush
>>
>> You need to setup passwordless connections with ssh to the node you just
>> added. You (or somebody else) probably did this already on the first
>> compute node, otherwise the MPI programs wouldn't run
>> across the network.
>>
>> See the very last sentence on this FAQ:
>>
>> http://www.open-mpi.org/faq/?category=running#run-prereqs
>>
>> And try this recipe (if you use RSA keys instead of DSA, replace all "dsa"
>> by "rsa"):
>>
>>
>> http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3
>>
>> I hope this helps.
>>
>> Gus Correa
>> ---------------------------------------------------------------------
>> Gustavo Correa
>> Lamont-Doherty Earth Observatory - Columbia University
>> Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Ankush Kaul wrote:
>>
>>> Thank you, i m reading up on de tools u suggested.
>>>
>>> I am facing another problem, my cluster is working fine with 2 hosts (1
>>> master + 1 compute node) but when i tried 2 add another node (1 master + 2
>>> compute node) its not working. it works fine when i give de command mpirun
>>> -host <hostname> /work/Pi
>>>
>>> but when i try to run
>>> mpirun /work/Pi it gives following error:
>>>
>>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>>> root_at_192.168.67.241 <mailto:root_at_192.168.67.241>'s password:
>>>
>>> Permission denied, please try again. <The password i provide is correct>
>>>
>>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>>>
>>> Permission denied, please try again.
>>>
>>> root_at_192.168.45.65 <mailto:root_at_192.168.45.65>'s password:
>>>
>>> Permission denied (publickey,gssapi-with-mic,password).
>>>
>>>
>>> Permission denied, please try again.
>>>
>>> root_at_192.168.67.241 <mailto:root_at_192.168.67.241>'s password:
>>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>> base/pls_base_orted_cmds.c at line 275
>>>
>>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>> pls_rsh_module.c at line 1166
>>>
>>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>> errmgr_hnp.c at line 90
>>>
>>> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to
>>> start as expected.
>>>
>>> [ccomp1.cluster:03503] ERROR: There may be more information available
>>> from
>>>
>>> [ccomp1.cluster:03503] ERROR: the remote shell (see above).
>>>
>>> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status
>>> 255.
>>>
>>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>> base/pls_base_orted_cmds.c at line 188
>>>
>>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>>> pls_rsh_module.c at line 1198
>>>
>>>
>>> What is the problem here?
>>>
>>>
>>> --------------------------------------------------------------------------
>>>
>>> mpirun was unable to cleanly terminate the daemons for this job. Returned
>>> value Timeout instead of ORTE_SUCCESS
>>>
>>>
>>> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <Eugene.Loh_at_[hidden] <mailto:
>>> Eugene.Loh_at_[hidden]>> wrote:
>>>
>>> Ankush Kaul wrote:
>>>
>>> Finally, after mentioning the hostfiles the cluster is working
>>> fine. We downloaded few benchmarking softwares but i would like
>>> to know if there is any GUI based benchmarking software so that
>>> its easier to demonstrate the working of our cluster while
>>> displaying our cluster.
>>>
>>>
>>> I'm confused what you're looking for here, but thought I'd venture a
>>> suggestion.
>>>
>>> There are GUI-based performance analysis and tracing tools. E.g.,
>>> run a program, [[semi-]automatically] collect performance data, run
>>> a GUI-based analysis tool on the data, visualize what happened on
>>> your cluster. Would this suit your purposes?
>>>
>>> If so, there are a variety of tools out there you could try. Some
>>> are platform-specific or cost money. Some are widely/freely
>>> available. Examples of these tools include Intel Trace Analyzer,
>>> Jumpshot, Vampir, TAU, etc. I do know that Sun Studio (Performance
>>> Analyzer) is available via free download on x86 and SPARC and Linux
>>> and Solaris and works with OMPI. Possibly the same with Jumpshot.
>>> VampirTrace instrumentation is already in OMPI, but then you need
>>> to figure out the analysis-tool part. (I think the Vampir GUI tool
>>> requires a license, but I'm not sure. Maybe you can convert to TAU,
>>> which is probably available for free download.)
>>>
>>> Anyhow, I don't even know if that sort of thing fits your
>>> requirements. Just an idea.
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>