On Jan 22, 2009, at 11:26 PM, Sangamesh B wrote:
> We''ve a cluster with 23 nodes connected to IB switch and 8 nodes
> have connected to ethernet switch. Master node is also connected to IB
> switch. SGE(with tight integration, -pe orte) is used for
> parallel/serial job submission.
> Open MPI-1.3 is installed on master node with IB support
> (--with-openib=/usr). The same folder is copied to the remaining 23 IB
> Now what shall I do for remaining 8 ethernet nodes:
> (1) Copy the same folder(IB) to these nodes
> (2) Install Open MPI on one of the 8 eight ethernet nodes. Copy the
> same to 7 nodes.
> (3) Install an ethernet version of Open MPI on master node and copy
> to 8 nodes.
Either 1 or 2 is your best bet.
Do you have OFED installed on all nodes (either explicitly, or
included in your Linux distro)?
If so, I believe that at least some users with configurations like
this install OMPI with OFED support (--with-openib=/usr, as you
mentioned above) on all nodes. OMPI will notice that there is no
OpenFabrics-capable hardware on the ethernet-only nodes and will
simply not use the openib BTL plugin.
Note that OMPI v1.3 got better about being silent about the lack of
OpenFabrics devices when the openib BTL is present (OMPI v1.2 issued a
warning about this).
How you intend to use this setup is up to you; you may want to
restrict jobs to 100% IB or 100% ethernet via SGE, or you may want to
let them mix, realizing that the overall parallel job may be slowed
down to the speed of the slowest network (e.g., ethernet).