Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] network timeout
From: marco atzeri (marco.atzeri_at_[hidden])
Date: 2012-12-13 13:34:38


On 11/24/2012 4:02 PM, Ralph Castain wrote:
> Try limiting the interfaces we use to see if that's really the problem. I forget if cygwin has "ifconfig" or not, but use a tool to report the networks, and then start excluding them by adding
>
> -mca oob_tcp_if_exclude foo,bar
>
> to your cmd line until you find the one that is causing the hang. That will (a) confirm that it is a network timeout issue, and (b) which network is causing the problem.

Ralph,
I was unable to exclude in this way the interface using
one of the several "strange" name windows use for the interfaces

   {258B6C87-9B24-477D-A5D1-97AE07FEABAB}
   NPF_{258B6C87-9B24-477D-A5D1-97AE07FEABAB}

But I found the root cause: The driver of the Vodafone USB InternetKey.

So for the next one hitting the same or similar issues:
in theory the interface was disabled, but it seems that when queried
the driver tries to contact Vodafone servers through any active interface .
Thanks to Wireshark I was able to notice the driver polling behaviour.

After removing all versions of the driver ( following [1] ) ,
the delay disappeared.

$ time orterun -n 4 ./hello_c.exe
Hello, world, I am 0 of 4
Hello, world, I am 2 of 4
Hello, world, I am 1 of 4
Hello, world, I am 3 of 4

real 0m2.552s
user 0m0.933s
sys 0m1.774s

[1] http://www.petri.co.il/removing-old-drivers-from-vista-and-windows7.htm

Regards
Marco

>
>
> On Nov 24, 2012, at 1:00 AM, marco atzeri <marco.atzeri_at_[hidden]> wrote:
>
>> on cygwin running on localhost on standalone computer I noticed
>> a large time discrepancy when the computer is connected or not to
>> the network.
>>
>> Physical Connected:
>>
>> marco_at_MARCOATZERI /pub/devel/openmpi/examples
>> $ time mpirun -n 4 ./hello_c.exe
>> Hello, world, I am 0 of 4
>> Hello, world, I am 1 of 4
>> Hello, world, I am 2 of 4
>> Hello, world, I am 3 of 4
>>
>> real 1m14.568s
>> user 0m1.496s
>> sys 0m2.602s
>>
>> NOT connected (all interface down)
>>
>> $ time mpirun -n 4 ./hello_c.exe
>> Hello, world, I am 0 of 4
>> Hello, world, I am 2 of 4
>> Hello, world, I am 1 of 4
>> Hello, world, I am 3 of 4
>>
>> real 0m3.323s
>> user 0m1.480s
>> sys 0m2.118s
>>
>>
>> I guess the 1 minute is due to some time of timeout.
>> Is such delay present on any other platform ?
>> Any workaround to remove it ?
>>
>> Regards
>> Marco
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>