Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Memory affinity
From: Tim Prince (n8tm_at_[hidden])
Date: 2010-09-27 20:11:01


  On 9/27/2010 2:50 PM, David Singleton wrote:
> On 09/28/2010 06:52 AM, Tim Prince wrote:
>> On 9/27/2010 12:21 PM, Gabriele Fatigati wrote:
>>> HI Tim,
>>>
>>> I have read that link, but I haven't understood if enabling processor
>>> affinity are enabled also memory affinity because is written that:
>>>
>>> "Note that memory affinity support is enabled only when processor
>>> affinity is enabled"
>>>
>>> Can i set processory affinity without memory affinity? This is my
>>> question..
>>>
>>>
>>> 2010/9/27 Tim Prince<n8tm_at_[hidden]>
>>>> On 9/27/2010 9:01 AM, Gabriele Fatigati wrote:
>>>>> if OpenMPI is numa-compiled, memory affinity is enabled by default?
>>>>> Because I didn't find memory affinity alone ( similar) parameter to
>>>>> set at 1.
>>>>>
>>>>>
>>>> The FAQ http://www.open-mpi.org/faq/?category=tuning#using-paffinity
>>>> has a useful introduction to affinity. It's available in a default
>>>> build, but not enabled by default.
>>>>
>> Memory affinity is implied by processor affinity. Your system libraries
>> are set up so as to cause any memory allocated to be made local to the
>> processor, if possible. That's one of the primary benefits of processor
>> affinity. Not being an expert in openmpi, I assume, in the absence of
>> further easily accessible documentation, there's no useful explicit way
>> to disable maffinity while using paffinity on platforms other than the
>> specified legacy platforms.
>>
>
> Memory allocation policy really needs to be independent of processor
> binding policy. The default memory policy (memory affinity) of "attempt
> to allocate to the NUMA node of the cpu that made the allocation request
> but fallback as needed" is flawed in a number of situations. This is
> true
> even when MPI jobs are given dedicated access to processors. A common
> one is
> where the local NUMA node is full of pagecache pages (from the checkpoint
> of the last job to complete). For those sites that support
> suspend/resume
> based scheduling, NUMA nodes will generally contain pages from suspended
> jobs. Ideally, the new (suspending) job should suffer a little bit of
> paging
> overhead (pushing out the suspended job) to get ideal memory placement
> for
> the next 6 or whatever hours of execution.
>
> An mbind (MPOL_BIND) policy of binding to the one local NUMA node will
> not
> work in the case of one process requiring more memory than that local
> NUMA
> node. One scenario is a master-slave where you might want:
> master (rank 0) bound to processor 0 but not memory bound
> slave (rank i) bound to processor i and memory bound to the local
> memory
> of processor i.
>
> They really are independent requirements.
>
> Cheers,
> David
>
> _______________________________________________
interesting; I agree with those of your points on which I have enough
experience to have an opinion.
However, the original question was not whether it would be desirable to
have independent memory affinity, but whether it is possible currently
within openmpi to avoid memory placements being influenced by processor
affinity.
I have seen the case you mention, where performance of a long job
suffers because the state of memory from a previous job results in an
abnormal number of allocations falling over to other NUMA nodes, but I
don't know the practical solution.

-- 
Tim Prince