Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpi_init waits 64 seconds if vpn is connected
From: David A. Boger (dab143_at_[hidden])
Date: 2013-03-22 07:37:21


Thanks Ralph. I have a Mac OS X 10.6.8 laptop where I can run
open-mpi 1.2.8 and open-mpi 1.6.4 with the vpn connected without any problem,
even without having to exclude the vpn interface, so you're probably right --
the existence of the vpn interface alone doesn't explain the problem.
Nevertheless, disconnecting the vpn on my ubuntu box definitely resolves the
problem, so I think it's tied in somehow.

Do you think the process is
hanging looking for a specific TCP connection, or just any TCP
connection? If it's a specific one, is there a way to find out which or
to test using something outside of mpirun that would show the same
delay?

Thanks again,
David

On Fri, Mar 22, 2013 12:25 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>
The process is hanging trying to open a TCP connection back to mpirun. I would
>have thought that excluding the vpn interface would help, but it could be that
>there is still some interference from the vpn software itself - as you probably
>know, vpn generally tries to restrict connections.
>
>I don't recall seeing this behavior with my laptop (which also runs with a
>Cisco vpn), but I'll check it again in the morning and let you know.
>
>
>On Mar 21, 2013, at 6:52 PM, David A. Boger <dab143_at_[hidden]> wrote:
>
>> I am having a problem on my linux desktop where mpi_init hangs for
>approximately 64 seconds if I have my vpn client connected but runs immediately
>if I disconnect the vpn. I've picked through the FAQ and Google but have failed
>to come up with a solution.
>>
>> Some potentially relevant information: I am using Open MPI 1.4.3 under
>ubuntu 12.04.1 and Cisco AnyConnect VPN Client. (I have also downloaded
>openmpi 1.6.4 and built it from source but believe it behaves the same
>way.)
>>
>> Some potentially irrelevant information: I believe SSH tunneling is
>disabled by the vpn. While the vpn is connected, ifconfig shows an extra
>interface (cscotun0 with inet addr:10.248.17.27 that shows up in the
>contact.txt file:
>>
>> wt217:~/wrk/mpi> cat
>/tmp/openmpi-sessions-dab143_at_wt217_0/29142/contact.txt
>> 1909850112.0;tcp://192.168.1.3:48237;tcp://10.248.17.27:48237
>> 22001
>>
>> The code is simply
>>
>> #include <stdio.h>
>> #include <mpi.h>
>>
>> int main(int argc, char** argv)
>> {
>> MPI_Init(&argc, &argv);
>> MPI_Finalize();
>> return 0;
>> }
>>
>> I compile it using "mpicc -g mpi_hello.c -o mpi_hello" and
>execute it using "mpirun -d -v ./mpi_hello". (The problem occurs
>whether or not I asked for more than one processor.) With verbosity on, I
>get the following output:
>>
>> wt217:~/wrk/mpi> mpirun -d -v ./mpi_hello
>> [wt217:22015] procdir: /tmp/openmpi-sessions-dab143_at_wt217_0/29144/0/0
>> [wt217:22015] jobdir: /tmp/openmpi-sessions-dab143_at_wt217_0/29144/0
>> [wt217:22015] top: openmpi-sessions-dab143_at_wt217_0
>> [wt217:22015] tmp: /tmp
>> [wt217:22015] [[29144,0],0] node[0].name wt217 daemon 0 arch ffc91200
>> [wt217:22015] Info: ! Setting up debugger process table for applications
>> MPIR_being_debugged = 0
>> MPIR_debug_state = 1
>> MPIR_partial_attach_ok = 1
>> MPIR_i_am_starter = 0
>> MPIR_proctable_size = 1
>> MPIR_proctable:
>> (i, host, exe, pid) = (0, wt217,
>/home/dab143/wrk/mpi/./mpi_hello, 22016)
>> [wt217:22016] procdir: /tmp/openmpi-sessions-dab143_at_wt217_0/29144/1/0
>> [wt217:22016] jobdir: /tmp/openmpi-sessions-dab143_at_wt217_0/29144/1
>> [wt217:22016] top: openmpi-sessions-dab143_at_wt217_0
>> [wt217:22016] tmp: /tmp
>> <hangs for approximately 64 seconds>
>> [wt217:22016] [[29144,1],0] node[0].name wt217 daemon 0 arch ffc91200
>> [wt217:22016] sess_dir_finalize: proc session dir not empty - leaving
>> [wt217:22015] sess_dir_finalize: proc session dir not empty - leaving
>> [wt217:22015] sess_dir_finalize: job session dir not empty - leaving
>> [wt217:22015] sess_dir_finalize: proc session dir not empty - leaving
>> orterun: e! xiting with status 0
>>
>> The code hangs for approximately 6! 4 second s after the line that reads
>"tmp: /tmp".
>>
>> If I attach gdb to the process during this time, the stack trace
>(attached) shows that the pause is in __GI___poll in
>/sysdeps/unix/sysv/linux/poll.c:83.
>>
>> If I add "-mca oob_tcp_if_exclude cscotun0", then the
>corresponding address for that vpn interface no longer shows up in contact.txt,
>but the problem remains. I also add "-mca btl ^cscotun0 -mca
>btl_tcp_if_exclude cscotun0" with no effect.
>>
>> Any idea what is hanging this up or how I can get more information as to
>what is going on during the pause? I assume connecting the vpn has caused
>mpi_init to look for something that isn't available and that eventually times
>out, but I don't know what.
>>
>> Output from ompi_info and the gdb stack trace is attached.
>>
>> Thanks,
>> David
>>
>>
><stack.txt.bz2><ompi_info.txt.bz2>_______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>