Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] use additional interface for openmpi
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-10-03 07:23:10

On Sep 29, 2009, at 9:58 AM, <worldeb_at_[hidden]> <worldeb_at_[hidden]> wrote:

> > Open MPI should just "figure it out" and do the Right Thing at run-
> > time -- is that not happening?
> you are right it should.
> But I want to exclude any traffic from OpenMPI communications, like
> NFS, traffic from other jobs and so on.
> And use only special ethernet interface for this purpose.
> So I have OpenMPI 1.3.3 installed on all nodes and head node in the
> same directory.
> OS is the same on all cluster - debian 5.0
> On nodes I have two interfaces eth0 - for NFS and so on...
> and eht1 for OpenMPI.
> On head node I have 5 interfaces: eth0 for NFS, eth4 for OpenMPI
> Network is next:
> 1) Head node eht0 + nodes eht0 :
> 2) Head node eth4 + nodes eth1 :
> So how I can configure OpenMPI for using only network 2) for my
> purpose?

Try using "--mca btl_tcp_if_exclude eth0 --mca oob_tcp_if_exclude
eth0". This will tell all machines not to use eth0. The only other
network available is eth4 or eth1, so it should do the Right thing.

Note that Open MPI has *two* TCP subsystems: the one used for MPI
communications and the one used for "out of band" communications. BTL
is the MPI communication subsystem; "oob" is the Out of Band
communications subsystem.

> Other problem is next:
> I try to run some examples. But unfortunately it is not work.
> My be it is not correctly configured network.
> I can submit any jobs only on one host from this host.
> When I submit from head node for example to other nodes it hangs
> without any messages.
> And on node where I want to calculate I see that here is started
> orted daemon.
> (I use default config files)
> Below is examples:
> mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 --mca
> btl_tcp_if_include eth0 -np 2 -host n10,n11 cpi
> no output, no calculations, only orted daemon on nodes
> mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 -
> host n10,n11 cpi
> the same as abowe
> mpirun -v --mca btl self,sm,tcp --mca btl_base_verbose 30 -np 2 -
> host n00,n00 cpi
> n00 is head node - it works and produces output.

It sounds like OMPI is getting confused between the non-uniform
networks. I have heard reports of OMPI not liking networks with
different interface names, but it's not immediately obvious why the
interface names are relevant to OMPI's selection criteria (and not
enough details are available in the reports I heard before).

Try the *_if_exclude methods above and see if that works for you. If
not, let us know.

Jeff Squyres