I'm afraid this doesn't make much sense to me. LSF has dispatched node1 and node2 - correct? It sounds like you have also given those names aliases that refer to their IB ports - generally a very bad practice, but let's set that aside for now.

If they are the same physical nodes, then the node name makes no difference - OMPI will see both TCP and IB on the node and use them. You can control which interfaces get used by simply telling OMPI on its command line:

mpirun -mca btl tcp,sm,self ...  will use shared memory and TCP

mpirun -mca openib,sm,self ...  will use IB and shared memory

Using host names to try and control which network gets used isn't going to work - the software is too smart to be fooled that way.

On Feb 2, 2013, at 6:33 AM, HM Li <lihm0@163.com> wrote:

Can you help me? 

The bnode1.bnode2 and node1,node2 are the hostnames of the same nodes corresponding to the InfiniBand and ethernet network respectively.
The node1,node2 are the nodes declarated in lsf.cluster.name
In order to use the IB network, I have modified the lsf mpijob script, and modified the HOSTFILE containing the nodes which LSF dispatched from node to bnode.
Then use "mpiexec -machinefile $HOSTFILE $COMMANDLINE" to run my jobs.
But the job exits and shows:
A hostfile was provided that contains at least one node not
present in the allocation:

  hostfile:  /home/nic/hmli/.lsbatch/bhost23263.node1
  node:      bnode2

If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile
. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.

I don't want to change the hostname from node to bnode in lsf.cluster.name.

Thank you very much.

users mailing list