Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Terry D. Dontje (Terry.Dontje_at_[hidden])
Date: 2007-08-31 14:11:03


Scott Atchley wrote:

>Terry,
>
>Are you testing on Linux? If so, which kernel?
>
>
>
No, I am running into issues on Solaris but Ollie's run of the test code
on Linux seems to work fine.

--td

>See the patch to iperf to handle kernel 2.6.21 and the issue that
>they had with usleep(0):
>
>http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt
>
>Scott
>
>On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote:
>
>
>
>>Ok, I have an update to this issue. I believe there is an
>>implementation difference of sched_yield between Linux and
>>Solaris. If
>>I change the sched_yield in opal_progress to be a usleep(500) then my
>>program completes quite quickly. I have sent a few questions to a
>>Solaris engineer and hopefully will get some useful information.
>>
>>That being said, CT-6's implementation also used yield calls (note
>>this
>>actually is what sched_yield reduces down to in Solaris) and we did
>>not
>>see the same degradation issue as with Open MPI. I believe the reason
>>is because CT-6's SM implementation is not calling CT-6's version of
>>progress recursively and forcing all the unexpected to be read in
>>before
>>continuing. CT-6 also has a natural flow control in it's
>>implementation
>>(ie it has a fixed set fifo for eager messages.
>>
>>I believe both of these characteristics lend CT-6 to not being
>>completely killed by the yield differences.
>>
>>--td
>>
>>
>>Li-Ta Lo wrote:
>>
>>
>>
>>>On Thu, 2007-08-30 at 12:45 -0400, Terry.Dontje_at_[hidden] wrote:
>>>
>>>
>>>
>>>
>>>>Li-Ta Lo wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Thu, 2007-08-30 at 12:25 -0400, Terry.Dontje_at_[hidden] wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Li-Ta Lo wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>hmmm, interesting since my version doesn't abort at all.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>Some problem with fortran compiler/language binding? My C
>>>>>>>translation
>>>>>>>doesn't have any problem.
>>>>>>>
>>>>>>>[ollie_at_exponential ~]$ mpirun -np 4 a.out 10
>>>>>>>Target duration (seconds): 10.000000, #of msgs: 50331, usec
>>>>>>>per msg:
>>>>>>>198.684707
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>Did you oversubscribe? I found np=10 on a 8 core system
>>>>>>clogged things
>>>>>>up sufficiently.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4
>>>>>threads).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Is this using Linux?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Yes.
>>>
>>>Ollie
>>>
>>>
>>>_______________________________________________
>>>devel mailing list
>>>devel_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>