Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Accessing OpenMPI processes over Internet using ssh
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-11-25 09:28:50


On Nov 25, 2011, at 3:42 AM, Reuti wrote:

> Hi Ralph,
>
> Am 25.11.2011 um 03:47 schrieb Ralph Castain:
>
>>
>> On Nov 24, 2011, at 2:00 AM, Reuti wrote:
>>
>>> Hi,
>>>
>>> Am 24.11.2011 um 05:26 schrieb Jaison Paul:
>>>
>>>> I am trying to access OpenMPI processes over Internet using ssh and not quite successful, yet. I believe that I should be able to do it.
>>>>
>>>> I have to run one process on my PC and the rest on a remote cluster over internet. I have set the public keys (at .ssh/authorized_keys) to access remote nodes without a password.
>>>>
>>>> I use hostfile to run mpi. It will read something like:
>>>> -----------------------------
>>>> localhost
>>>> user_at_[hidden]
>>>
>>> this is not a valid syntax for Open MPI.
>>
>> This isn't correct
>
> I'm completely sorry about this, it wasn't my intention to misguide anyone.

Not a problem at all!

> But this syntax isn't something I would have expected to work, nor is it documented in `man mpiexec` AFAICS. I suggest to add it there or at http://www.open-mpi.org/faq/?category=running. Or maybe a complete new man page for "hostfile", where also slots= and max_slots= are explained in one location.

Yeah, our documentation is somewhat out-of-date in that area. The best explanation is on the wiki:

https://svn.open-mpi.org/trac/ompi/wiki/HostFilePlan

That was the design document I used when I wrote the code.

>
> NB: Checking orte/util/hostfile/hostfile.c even ^ to exclude hosts is supported, but from which initial list will they be excluded? In the `man orte_hosts` I find --default-hostfile which could be the initial list, but --default-hostfile isn't in mpirun's man page.

The "initial list" is whatever hostfile you provided - either the default hostfile or one specified on the cmd line. Remember, we use a progression here:

1. if a default hostfile exists, get our allocation from it. Any other hostfiles specified on the cmd line are then used to filter hosts from the default hostfile - i.e., we will ignore any hostname given in the cmd line hostfile if it wasn't included in the default hostfile. The "exclude" option applies in both cases. Any exclude directive in the default hostfile will ensure that host isn't included in the allocation. An excluded host in the cmd line hostfile will ensure that host is removed from the final allocation, should it have been present in the default hostfile.

2. if a default hostfile doesn't exist, then cycle across all hostfiles given on the cmd line and use the aggregate list as the allocation. I believe any exclude option here would apply only to the individual hostfile - i.e., if one hostfile includes a node and another excludes it, I suspect the node will wind up in the allocation.

Once we have that global allocation, the nodes used for launch of each app_context are filtered from that global allocation using the hostfile specified for that app_context. So any exclude in that hostfile will impact only the associated app_context.

Confusing and complex, I know - unfortunately, that is what I was told the community would want. :-/

HTH
Ralph

>
> -- Reuti
>
>
>> - we have long supported that syntax in a hostfile, and there is no issue with having a different user name at each node.
>>
>> Jaison: are you sure your nodes are setup for password-less ssh? In other words, have you setup your .ssh files on the remote nodes so they will allow us to ssh a process on them without providing a password? This is the typical problem we see.
>>
>>
>>>
>>>
>>>> -----------------------------
>>>> But it fails.
>>>>
>>>> The issue seems to be the user! That is, the user on my PC is different to that of user at remotehosts. That's my assumption.
>>>>
>>>> Is this the problem? Is there any work-around to solve this issue? Do I need to have same username at all nodes to solve this issue?
>>>
>>> You can define nicknames for an ssh connection in a file ~/.ssh/config like:
>>>
>>> Host foobar
>>> User baz
>>> Hostname the.remote.server.demo
>>> Port 1234
>>>
>>> While this will work with any nickname for an ssh connection, in your case the nickname must match the one specified in the hostfile, as Open MPI won't use this lookup file:
>>>
>>> Host remotehost.com
>>> User user
>>>
>>> ssh should then use the entries therein to initiate the connection. For details you can have a look at `man ssh_config`.
>>>
>>> -- Reuti
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users