Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How openmpi-1.6.3 using nodes which not LSF dispatch?
From: HM Li (lihm0_at_[hidden])
Date: 2013-02-04 23:39:55


Thanks Ralph and Jeff, I understand.

On 2013年02月05日 03:34, Jeff Squyres (jsquyres) wrote:
> To be clear: this is a common misconception.
>
> Open MPI does not determine which network to use for MPI communication by the hostname(s) you use to launch your application. Specifically: the hostnames that you list in the hostfile, command line, or whatever your resource manager provides are *not* used to determine which networks to use for MPI communication.
>
> Open MPI only uses hostnames to identify unique servers (so that we can launch processeson them). We use different controls -- outlined by Ralph -- to determine which network(s) to use for MPI communication.
>
> Hope that helps.
>
>
> On Feb 2, 2013, at 6:43 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I'm afraid this doesn't make much sense to me. LSF has dispatched node1 and node2 - correct? It sounds like you have also given those names aliases that refer to their IB ports - generally a very bad practice, but let's set that aside for now.
>>
>> If they are the same physical nodes, then the node name makes no difference - OMPI will see both TCP and IB on the node and use them. You can control which interfaces get used by simply telling OMPI on its command line:
>>
>> mpirun -mca btl tcp,sm,self ... will use shared memory and TCP
>>
>> mpirun -mca openib,sm,self ... will use IB and shared memory
>>
>> Using host names to try and control which network gets used isn't going to work - the software is too smart to be fooled that way.
>>
>>
>> On Feb 2, 2013, at 6:33 AM, HM Li <lihm0_at_[hidden]> wrote:
>>
>>> Can you help me?
>>>
>>> The bnode1.bnode2 and node1,node2 are the hostnames of the same nodes corresponding to the InfiniBand and ethernet network respectively.
>>> The node1,node2 are the nodes declarated in lsf.cluster.name
>>> In order to use the IB network, I have modified the lsf mpijob script, and modified the HOSTFILE containing the nodes which LSF dispatched from node to bnode.
>>> Then use "mpiexec -machinefile $HOSTFILE $COMMANDLINE" to run my jobs.
>>> But the job exits and shows:
>>> -------------------------------------------------------------
>>> A hostfile was provided that contains at least one node not
>>> present in the allocation:
>>>
>>> hostfile: /home/nic/hmli/.lsbatch/bhost23263.node1
>>> node: bnode2
>>>
>>> If you are operating in a resource-managed environment, then only
>>> nodes that are in the allocation can be used in the hostfile. You
>>> may find relative node syntax to be a useful alternative to
>>> specifying absolute node names see the orte_hosts man page for
>>> further information.
>>> -------------------------------------------------------------
>>>
>>> I don't want to change the hostname from node to bnode in lsf.cluster.name.
>>>
>>> Thank you very much.
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>