Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
From: Jaison Paul (jmulerik_at_[hidden])
Date: 2011-11-30 15:54:58


Jeff Squyres <jsquyres <at> cisco.com> writes:

>
> On Nov 30, 2011, at 6:03 AM, Jaison Paul wrote:
>
> > Yes, we have set up .ssh file on remote EC2 hosts. Is there anything else
that we should be taking care of when
> dealing with EC2?
>
> I have heard that Open MPI's TCP latency on EC2 is horrid. I actually talked
with some Amazon / EC2 folks about
> it at SC'11 a few weeks ago; we set a date to dive into it a bit deeper in
December.
>
> No promises on when/if the TCP latency will improve, but it's definitely
something that we're looking at.
> My first *guess* is that it might have something to do with specifying
btl_tcp_if_include /
> oob_tcp_if_include improperly (or not at all) -- but that's a SWAG.
>

I have tried little bit more:

I have set the MCA parameters as follows:
mpirun -np 1 --mca btl tcp,self --mca btl_tcp_if_exclude lo,eth0 -hostfile
hostinfo nbs-client -bynode

But still failed and got the following error:

Permission denied (publickey).
--------------------------------------------------------------------------
A daemon (pid 24744) died unexpectedly with status 255 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared
libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
mpirun: clean termination accomplished

I dont understand the "Permission denied (publickey)" error. I access the EC2
instance using password-less ssh as follows:

ssh ubuntu_at_ec2-67-202-**-***.compute-1.amazonaws.com

So, what went wrong?

hostinfo file is:

[jmulerik_at_jaison Client]$ cat hostinfo
localhost
ubuntu_at_[hidden]

Jaison