Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Terry D. Dontje (Terry.Dontje_at_[hidden])
Date: 2007-08-31 14:11:03


Scott Atchley wrote:

>Terry,
>
>Are you testing on Linux? If so, which kernel?
>
>
>
No, I am running into issues on Solaris but Ollie's run of the test code
on Linux seems to work fine.

--td

>See the patch to iperf to handle kernel 2.6.21 and the issue that
>they had with usleep(0):
>
>http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt
>
>Scott
>
>On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote:
>
>
>
>>Ok, I have an update to this issue. I believe there is an
>>implementation difference of sched_yield between Linux and
>>Solaris. If
>>I change the sched_yield in opal_progress to be a usleep(500) then my
>>program completes quite quickly. I have sent a few questions to a
>>Solaris engineer and hopefully will get some useful information.
>>
>>That being said, CT-6's implementation also used yield calls (note
>>this
>>actually is what sched_yield reduces down to in Solaris) and we did
>>not
>>see the same degradation issue as with Open MPI. I believe the reason
>>is because CT-6's SM implementation is not calling CT-6's version of
>>progress recursively and forcing all the unexpected to be read in
>>before
>>continuing. CT-6 also has a natural flow control in it's
>>implementation
>>(ie it has a fixed set fifo for eager messages.
>>
>>I believe both of these characteristics lend CT-6 to not being
>>completely killed by the yield differences.
>>
>>--td
>>
>>
>>Li-Ta Lo wrote:
>>
>>
>>
>>>On Thu, 2007-08-30 at 12:45 -0400, Terry.Dontje_at_[hidden] wrote:
>>>
>>>
>>>
>>>
>>>>Li-Ta Lo wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>On Thu, 2007-08-30 at 12:25 -0400, Terry.Dontje_at_[hidden] wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Li-Ta Lo wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>hmmm, interesting since my version doesn't abort at all.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>Some problem with fortran compiler/language binding? My C
>>>>>>>translation
>>>>>>>doesn't have any problem.
>>>>>>>
>>>>>>>[ollie_at_exponential ~]$ mpirun -np 4 a.out 10
>>>>>>>Target duration (seconds): 10.000000, #of msgs: 50331, usec
>>>>>>>per msg:
>>>>>>>198.684707
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>Did you oversubscribe? I found np=10 on a 8 core system
>>>>>>clogged things
>>>>>>up sufficiently.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4
>>>>>threads).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Is this using Linux?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>Yes.
>>>
>>>Ollie
>>>
>>>
>>>_______________________________________________
>>>devel mailing list
>>>devel_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>