Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with running MPI programs on machines withmultiple interfaces
From: amalik_at_[hidden]
Date: 2011-01-24 10:40:41


        Forgot to mention that I am running openmpi v-1.5.1.

Let your email find you with BlackBerry® from Vodafone

-----Original Message-----
From: Avinash Malik <amalik_at_[hidden]>
Sender: users-bounces_at_[hidden]
Date: Mon, 24 Jan 2011 15:22:39
To: Open MPI Users<users_at_[hidden]>
Reply-To: amalik_at_[hidden], Open MPI Users <users_at_[hidden]>
Subject: [OMPI users] Problem with running MPI programs on machines with
        multiple interfaces

        I have two mahcines each having 3 live interfaces: lo, eth0
        (interanet) and usb0 (internet). eth0 cannot access usb0 on the
        other machine (and vice-veras). Now, when I try to run the MPI
        program with these two hosts I cannot get any output, even --mca
        btl_base_verbose 30 does not give any output. If I set hostfile
        to have only localhost, then everything runs fine.

        I tried out the same code and hostfile with two other machines
        with two interfaces: lo and eth1, which can access each
        other. The program runs fine on these machines.

        Next, I setup btl_tcp_if_exclude to lo,usb0 (on the first arch)
        and also the ip-address/mask, but this does not work
        either. When I run the program on one machine and do "ps aux |
        grep mpi" on the other I can see --hnp-uri being set to the
        usb0's ip-address, which it should not, because I have set usb0
        to be exluded in the btl_tcp_if_exclude list. So, what exactly
        am I doing wrong here?

        I read the otimization FAQ and saw how openmpi builds the
        bipartite graphs for connection. But, as I said before, eth0
        cannot access usb0's ip and vice-versa, how can I get rid of the
        usb0 ip-address showing up in --hnp-uri, because this is the
        only difference between the working and the non-working archs.


Avinash Malik
users mailing list