Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with running openMPI program
From: Ankush Kaul (ankush.rkaul_at_[hidden])
Date: 2009-04-06 10:33:57


Thank you Sir the problem was with the paths of 'bin' and 'lib' folders so i
used de *mpirun --prefix* command. I want to run a program 'pi' now using
the cluster, so where do i place de file on de master and the compute nodes?

Also how do i come to know that the program is using resources of both the
nodes?

On Sat, Apr 4, 2009 at 7:05 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:

> It might be best to:
>
> 1. Setup a non-root user to run MPI applications
> 2. Setup SSH keys between the hosts for this non-root user so that you can
> "ssh <otherhost> uptime" and not be prompted for a password/passphrase
>
> This should help.
>
>
>
> On Apr 4, 2009, at 5:51 AM, Ankush Kaul wrote:
>
> I followed the steps given here to setup up openMPI cluster :
>> http://www.ps3cluster.umassd.edu/step3mpi.html
>>
>> My cluster consists of two nodes, master(192.168.67.18) and
>> salve(192.168.45.65), connected directly through a cross cable.
>>
>> After setting up the cluster n configuring the master node, i mounted
>> /tmp folder of master node on the slave node(i had some problems with nfs
>> at first but i worked my way out of it).
>>
>> Then i copied the 'pi.c' program in the /tmp folder and successfully
>> complied it, giving me a binary file 'pi'.
>>
>> Now when i try to run the binary file using the following command
>>
>> #mpirun –np 2 ./Pi
>>
>> root_at_192.168.45.65's password:
>> <it asks for the password>
>>
>> after entering the password it gives the following error:
>>
>> bash: orted: command not found
>> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1166
>> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file errmgr_hnp.c
>> at line 90
>> [ccomp.cluster:18963] ERROR: A daemon on node 192.168.45.65 failed to
>> start as expected.
>> [ccomp.cluster:18963] ERROR: There may be more information available from
>> [ccomp.cluster:18963] ERROR: the remote shell (see above).
>> [ccomp.cluster:18963] ERROR: The daemon exited unexpectedly with status
>> 127.
>> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1198
>> --------------------------------------------------------------------------
>> mpirun was unable to cleanly terminate the daemons for this job. Returned
>> value Timeout instead of ORTE_SUCCESS.
>> --------------------------------------------------------------------------
>>
>> I am totally lost now, as this is the first time i am working on a cluster
>> project, and need some help
>>
>> Thank you
>> Ankush
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>