Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error using hostfile
From: Mohan, Ashwin (ashmohan_at_[hidden])
Date: 2011-07-08 12:52:00


Hi,

I am following up on a previous error posted. Based on the previous
recommendation, I did set up a password less SSH login.

 

I created a hostfile comprising of 4 nodes (w/ each node having 4
slots). I tried to run my job on 4 slots but get no output. Hence, I end
up killing the job. I am trying to run a simple MPI program on 4 nodes
and trying to figure out what could be the issue. What could I check to
ensure that I can run jobs on 4 nodes (each node has 4 slots)

 

Here is the simple MPI program I am trying to execute on 4 nodes

**************************

if (my_rank != 0)

{

        sprintf(message, "Greetings from the process %d!", my_rank);

        dest = 0;

        MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag,
MPI_COMM_WORLD);

}

else

{

for (source = 1;source < p; source++)

{

        MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD,
&status);

        printf("%s\n", message);

}

 

****************************

My hostfile looks like this:

 

[amohan_at_myocyte48 ~]$ cat hostfile

myocyte46

myocyte47

myocyte48

myocyte49

*******************************

 

I use the following run command: : mpirun --hostfile hostfile -np 4
new46

And receive a blank screen. Hence, I have to kill the job.

 

OUTPUT ON KILLING JOB:

mpirun: killing job...

------------------------------------------------------------------------

--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
------------------------------------------------------------------------
--
        myocyte46 - daemon did not report back when launched
        myocyte47 - daemon did not report back when launched
        myocyte49 - daemon did not report back when launched
 
Thanks,
Ashwin.
 
 
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On
Behalf Of Ralph Castain
Sent: Wednesday, July 06, 2011 6:46 PM
To: Open MPI Users
Subject: Re: [OMPI users] Error using hostfile
 
Please see http://www.open-mpi.org/faq/?category=rsh#ssh-keys
 
 
On Jul 6, 2011, at 5:09 PM, Mohan, Ashwin wrote:
Hi,
 
I use the following command (mpirun --prefix /usr/local/openmpi1.4.3 -np
4 hello) to successfully execute a simple hello world command on a
single node.  Each node has 4 slots.  Following the successful execution
on one node, I wish to employ 4 nodes and for this purpose wrote a
hostfile. I submitted my job using the following command:
 
mpirun --prefix /usr/local/openmpi1.4.3 -np 4 --hostfile hostfile hello
 
Copied below is the output. How do I go about fixing this issue.
 
**********************************************************************
 
amohan_at_myocyte48's password: amohan_at_myocyte47's password:
Permission denied, please try again.
amohan_at_myocyte48's password:
Permission denied, please try again.
amohan_at_myocyte47's password:
Permission denied, please try again.
amohan_at_myocyte47's password:
Permission denied, please try again.
amohan_at_myocyte48's password:
 
Permission denied (publickey,gssapi-with-mic,password).
------------------------------------------------------------------------
--
A daemon (pid 22085) died unexpectedly with status 255 while attempting
to launch so we are aborting.
 
There may be more information reported by the environment (see above).
 
This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have
the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
------------------------------------------------------------------------
--
------------------------------------------------------------------------
--
mpirun was unable to cleanly terminate the daemons on the nodes shown
below. Additional manual cleanup may be required - please refer to
the "orte-clean" tool for assistance.
------------------------------------------------------------------------
--
        myocyte47 - daemon did not report back when launched
        myocyte48 - daemon did not report back when launched
 
**********************************************************************
 
Thanks,
Ashwin.
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users