Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Problem with running openMPI program
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-04 09:35:16


It might be best to:

1. Setup a non-root user to run MPI applications
2. Setup SSH keys between the hosts for this non-root user so that you
can "ssh <otherhost> uptime" and not be prompted for a password/
passphrase

This should help.

On Apr 4, 2009, at 5:51 AM, Ankush Kaul wrote:

> I followed the steps given here to setup up openMPI cluster : http://www.ps3cluster.umassd.edu/step3mpi.html
>
> My cluster consists of two nodes, master(192.168.67.18) and
> salve(192.168.45.65), connected directly through a cross cable.
>
> After setting up the cluster n configuring the master node, i
> mounted /tmp folder of master node on the slave node(i had some
> problems with nfs at first but i worked my way out of it).
>
> Then i copied the 'pi.c' program in the /tmp folder and successfully
> complied it, giving me a binary file 'pi'.
>
> Now when i try to run the binary file using the following command
>
> #mpirun –np 2 ./Pi
>
> root_at_192.168.45.65's password:
> <it asks for the password>
>
> after entering the password it gives the following error:
>
> bash: orted: command not found
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
> pls_base_orted_cmds.c at line 275
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1166
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> errmgr_hnp.c at line 90
> [ccomp.cluster:18963] ERROR: A daemon on node 192.168.45.65 failed
> to start as expected.
> [ccomp.cluster:18963] ERROR: There may be more information available
> from
> [ccomp.cluster:18963] ERROR: the remote shell (see above).
> [ccomp.cluster:18963] ERROR: The daemon exited unexpectedly with
> status 127.
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file base/
> pls_base_orted_cmds.c at line 188
> [ccomp.cluster:18963] [0,0,0] ORTE_ERROR_LOG: Timeout in file
> pls_rsh_module.c at line 1198
> --------------------------------------------------------------------------
> mpirun was unable to cleanly terminate the daemons for this job.
> Returned value Timeout instead of ORTE_SUCCESS.
> --------------------------------------------------------------------------
>
> I am totally lost now, as this is the first time i am working on a
> cluster project, and need some help
>
> Thank you
> Ankush
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems