Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with running openMPI program
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-04 09:35:16


It might be best to:

1. Setup a non-root user to run MPI applications
2. Setup SSH keys between the hosts for this non-root user so that you
can "ssh <otherhost> uptime" and not be prompted for a password/
passphrase

This should help.

On Apr 4, 2009, at 5:51 AM, Ankush Kaul wrote:

> I followed the steps given here to setup up openMPI cluster : http://www.ps3cluster.umassd.edu/step3mpi.html
>
> My cluster consists of two nodes, master(192.168.67.18) and
> salve(192.168.45.65), connected directly through a cross cable.
>
> After setting up the cluster n configuring the master node, i
> mounted /tmp folder of master node on the slave node(i had some
> problems with nfs at first but i worked my way out of it).
>
> Then i copied the 'pi.c' program in the /tmp folder and successfully
> complied it, giving me a binary file 'pi'.
>
> Now when i try to run the binary file using the following command
>
> #mpirun –np 2 ./Pi
>
> root_at_192.168.45.65's password:
> <it asks for the password>
>
> after entering the password it gives the following error:
>
> bash: orted: command not found
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
> pls_base_orted_cmds.c at line 275
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1166
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> errmgr_hnp.c at line 90
> [ccomp.cluster:18963] ERROR: A daemon on node 192.168.45.65 failed
> to start as expected.
> [ccomp.cluster:18963] ERROR: There may be more information available
> from
> [ccomp.cluster:18963] ERROR: the remote shell (see above).
> [ccomp.cluster:18963] ERROR: The daemon exited unexpectedly with
> status 127.
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
> pls_base_orted_cmds.c at line 188
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1198
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
> --------------------------------------------------------------------------
>
> I am totally lost now, as this is the first time i am working on a
> cluster project, and need some help
>
> Thank you
> Ankush
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems