Back when we were targetting glibc's calls rather than syscall(), the
set(get()) was the "safe" way to find the SuSE glibc that had a working
get() but a broken set(). That particular case is no longer an issue
when using the syscall() interface instead.
I am happy enough to see the set() call stay or go.
I like it as a sanity check on the len value we've selected. If the
potential races against an external entity or cpu going offline are a
big enough concern, we could still make a set() call to validate length,
but pass a NULL pointer for mask. Then we consider the validation to be
a success if we get errno==EFAULT rather than errno==EINVAL (which works
because len is checked first).
I want to say that I doubt that the races described are a real issue in
practice. This probe is done exactly once and is going to get/set the
affinity of the calling thread only. The potential that second thread
in the same process (an "internal external entity") can run the entire
probe to completion AND issue a set() for the current thread, all
between the get() and set() is VERY small, and a mutex inside the probe
function could eliminate this potential race entirely. If run in a
batch environment that wants to set our affinity, that has almost
certainly been done between fork() and exec() in the same place rlimits
are "imposed" upon a job. Finally, I think that the possibility that a
CPU goes offline between the get/set is also so small that I am OK with
ignoring it.
-Paul
Jeff Squyres wrote:
>Paul --
>
>You had a reason for doing this (get followed by set), but I confess
>that I don't recall the details why. Do you remember?
>
>
>On Dec 17, 2005, at 6:35 AM, Bogdan Costescu wrote:
>
>
>>On Fri, 16 Dec 2005, Jeff Squyres wrote:
>>
>>
>>>Hmm. Is there anything we can do about this? Should we just
>>>document this behavior?
>>>
>>The only way that I can see is to get rid of the _set to validate the
>>length. This eliminates both the possibility of getting the wrong
>>value after a change by an external entity and the possibility of
>>asking for an inactive/inexistent CPU that would return -EINVAL.
>>
>>--
>>Bogdan Costescu
>>
>>IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
>>Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
>>Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
>>E-mail: Bogdan.Costescu_at_[hidden]
>>
>>_______________________________________________
>>plpa-users mailing list
>>plpa-users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/plpa-users
>>
>
>
>--
>{+} Jeff Squyres
>{+} The Open MPI Project
>{+} http://www.open-mpi.org/
>
>
>
>_______________________________________________
>plpa-users mailing list
>plpa-users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/plpa-users
>
--
Paul H. Hargrove PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
|