Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph Castain (rhc_at_[hidden])
Date: 2007-03-22 12:19:05


Just for clarification: ompi_info only shows the *default* value of the MCA
parameter. In this case, mpi_yield_when_idle defaults to aggressive, but
that value is reset internally if the system sees an "oversubscribed"
condition.

The issue here isn't how many cores are on the node, but rather how many
were specifically allocated to this job. If the allocation wasn't at least 2
(in your example), then we would automatically reset mpi_yield_when_idle to
be non-aggressive, regardless of how many cores are actually on the node.

Ralph

On 3/22/07 7:14 AM, "Heywood, Todd" <heywood_at_[hidden]> wrote:

> Yes, I'm using SGE. I also just noticed that when 2 tasks/slots run on a
> 4-core node, the 2 tasks are still cycling between run and sleep, with
> higher system time than user time.
>
> Ompi_info shows the MCA parameter mpi_yield_when_idle to be 0 (aggressive),
> so that suggests the tasks aren't swapping out on bloccking calls.
>
> Still puzzled.
>
> Thanks,
> Todd
>
>
> On 3/22/07 7:36 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
>
>> Are you using a scheduler on your system?
>>
>> More specifically, does Open MPI know that you have for process slots
>> on each node? If you are using a hostfile and didn't specify
>> "slots=4" for each host, Open MPI will think that it's
>> oversubscribing and will therefore call sched_yield() in the depths
>> of its progress engine.
>>
>>
>> On Mar 21, 2007, at 5:08 PM, Heywood, Todd wrote:
>>
>>> P.s. I should have said this this is a pretty course-grained
>>> application,
>>> and netstat doesn't show much communication going on (except in
>>> stages).
>>>
>>>
>>> On 3/21/07 4:21 PM, "Heywood, Todd" <heywood_at_[hidden]> wrote:
>>>
>>>> I noticed that my OpenMPI processes are using larger amounts of
>>>> system time
>>>> than user time (via vmstat, top). I'm running on dual-core, dual-CPU
>>>> Opterons, with 4 slots per node, where the program has the nodes to
>>>> themselves. A closer look showed that they are constantly
>>>> switching between
>>>> run and sleep states with 4-8 page faults per second.
>>>>
>>>> Why would this be? It doesn't happen with 4 sequential jobs
>>>> running on a
>>>> node, where I get 99% user time, maybe 1% system time.
>>>>
>>>> The processes have plenty of memory. This behavior occurs whether
>>>> I use
>>>> processor/memory affinity or not (there is no oversubscription).
>>>>
>>>> Thanks,
>>>>
>>>> Todd
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users