Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] tcp communication problems with 1.4.3 and 1.4.4 rc2 on FreeBSD
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2011-07-08 14:19:27


On Jul 8, 2011, at 1:31 PM, Steve Kargl wrote:

> It seems that openmpi-1.4.4 compiled code is trying to use the
> wrong nic. My /etc/hosts file has
>
> 10.208.78.111 hpc.apl.washington.edu hpc
> 192.168.0.10 node10.cimu.org node10 n10 master
> 192.168.0.11 node11.cimu.org node11 n11
> 192.168.0.12 node12.cimu.org node12 n12
> ... down to ...
> 192.168.0.21 node21.cimu.org node21 n21
>
> Note, node10 and hpc are the same system (2 different NICs).

Don't confuse the machinefile with the NICs that OMPI will try to use. The machinefile is only hosts on which OMPI will launch. Specifically: the machinefile does not influence which NICs OMPI will use for MPI communications.

> hpc:kargl[252] /usr/local/openmpi-1.4.4/bin/mpif90 -o z -g -O ring_f90.f90
> hpc:kargl[253] cat > mf1
> node10 slots=1
> node11 slots=1
> node12 slots=1
> hpc:kargl[254] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf1 ./z
> Process 0 sending 10 to 1 tag 201 ( 3 processes in ring)
>
> in another xterm if I attach to the process on node10, I see
> with gdb.
>
> (gdb) bt
> #0 0x00000003c10f9b9c in kevent () from /lib/libc.so.7
> #1 0x000000000052ca18 in kq_dispatch ()
> #2 0x000000000052ba93 in opal_event_base_loop ()
> #3 0x000000000052549b in opal_progress ()
> #4 0x000000000048fcfc in mca_pml_ob1_send ()
> #5 0x0000000000428873 in PMPI_Send ()
> #6 0x000000000041a890 in pmpi_send__ ()
> #7 0x000000000041a3f0 in ring () at ring_f90.f90:34
> #8 0x000000000041a640 in main (argc=<value optimized out>,
> argv=<value optimized out>) at ring_f90.f90:10
> #9 0x000000000041a1cc in _start ()
> (gdb) quit
>
> Now, eliminating node10 from the machine file, I see:
>
> hpc:kargl[255] cat > mf2
> node11 slots=1
> node12 slots=1
> node13 slots=1
> hpc:kargl[256] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf2 ./z
> Process 0 sending 10 to 1 tag 201 ( 3 processes in ring)
> Process 0 sent to 1
> Process 0 decremented value: 9
> Process 0 decremented value: 8
> Process 0 decremented value: 7
> Process 0 decremented value: 6
> Process 0 decremented value: 5
> Process 0 decremented value: 4
> Process 0 decremented value: 3
> Process 0 decremented value: 2
> Process 0 decremented value: 1
> Process 0 decremented value: 0
> Process 0 exiting
> Process 1 exiting
> Process 2 exiting
>
> I also have a simple mpi test program netmpi.c from Argonne.
> It shows
>
> hpc:kargl[263] /usr/local/openmpi-1.4.4/bin/mpicc -o z -g -O GetOpt.c netmpi.c
> hpc:kargl[264] cat mf_ompi_3
> node11.cimu.org slots=1
> node16.cimu.org slots=1
> hpc:kargl[265] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf_ompi_3 ./z
> 1: node16.cimu.org
> 0: node11.cimu.org
> Latency: 0.000073617
> Sync Time: 0.000147234
> Now starting main loop
> 0: 0 bytes 16384 times --> 0.00 Mbps in 0.000073612 sec
> 1: 1 bytes 16384 times --> 0.10 Mbps in 0.000073612 sec
> 2: 2 bytes 3396 times --> 0.21 Mbps in 0.000073611 sec
> 3: 3 bytes 1698 times --> 0.31 Mbps in 0.000073609 sec
> 4: 5 bytes 2264 times --> 0.52 Mbps in 0.000073610 sec
> 5: 7 bytes 1358 times --> 0.73 Mbps in 0.000073608 sec
>
>
> hpc:kargl[268] cat mf_ompi_1
> node10.cimu.org slots=1
> node16.cimu.org slots=1
> hpc:kargl[267] /usr/local/openmpi-1.4.4/bin/mpiexec -machinefile mf_ompi_1 ./z
> 0: hpc.apl.washington.edu
> 1: node16.cimu.org

What function is netmpi.c using to get the hostname that is printed? It might be using MPI_Get_processor_name() or gethostname() -- both of which may return whatever hostname(1) returns.

Again -- this is not an indicator of which NIC Open MPI is using.

> (gdb) bt
> #0 0x00000003c0bedb9c in kevent () from /lib/libc.so.7
> #1 0x000000000052d648 in kq_dispatch ()
> #2 0x000000000052c6c3 in opal_event_base_loop ()
> #3 0x00000000005260cb in opal_progress ()
> #4 0x0000000000491d1c in mca_pml_ob1_send ()
> #5 0x000000000043c753 in PMPI_Send ()
> #6 0x000000000041a112 in Sync (p=0x7fffffffd4d0) at netmpi.c:573
> #7 0x000000000041a3cf in DetermineLatencyReps (p=0x3) at netmpi.c:593
> #8 0x000000000041a4fe in TestLatency (p=0x3) at netmpi.c:630
> #9 0x000000000041a958 in main (argc=1, argv=0x7fffffffd6a0) at netmpi.c:213
> (gdb) quit

The easiest way to fix this is likely to use the btl_tcp_if_include or btl_tcp_if_exclude MCA parameters -- i.e., tell OMPI exactly which interfaces to use:

    http://www.open-mpi.org/faq/?category=tcp#tcp-selection

Hypothetically, however, OMPI should be able to determine that 192.168.0.x is not reachable from the 10.x network (assuming your netmasks are set right), and automatically not use the 10.x network to reach any of the non-node10 machines. It's curious that this is not happening; I wonder if this is some kind of quirk of OMPI's reachability algorithms (http://www.open-mpi.org/faq/?category=tcp#tcp-routability) on FreeBSD...?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/