Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-12-12 16:40:34


To make this significantly easier, I called Paul and we discussed
this at length.

In short -- we ended up agreeing with you. :-)

As a personal sidenote -- it sucks that we all had to do this much
research to figure this out. In particular, we missed the fact that
all the kernel versions take 3 arguments (we thought that some took
2), and that's where some of the reasons for the initial approach
came from.

So we'll implement this as a syscall() and use the getaffinity
syscall to probe for the correct length (some kernels require <=
sizeof(long), some require == sizeof(long), and some are ok with >=
sizeof(long)). Using syscall() cuts out the potentially-buggy
middleman (glibc), and removes a layer of indirection that is
*usually* able to be deduced, but there's little reason not to use
syscall directly.

There are some older systems out there that do not have syscall(),
but I don't think we care about them (i.e., we can check for that in
configure). Plus, those systems won't have processor affinity, anyway.

Behind the scenes, Paul and I have been working on a standalone
library to handle all this junk called Portable Linux Processor
Affinity (PLPA). The SVN is hosted on svn.open-mpi.org -- we'll open
it up in a few days (i.e., after we adjust to the syscall()
interface). This library will be released under the BSD license and
a) is really pretty small, b) but most importantly, allows other
developers using Linux processor affinity to not worry about any of
these horrid details. The PLPA will have its own web page and
mailing list, too.

Thanks for your diligence in pestering us about this! :-)

On Dec 12, 2005, at 10:32 AM, Bogdan Costescu wrote:

> On Fri, 9 Dec 2005, Paul H. Hargrove wrote:
>
>> If one looks though enough kernel versions,
>
> In the meantime, I've gotten a copy of kernel/sched.c from a SGI Prism
> kernel - I assume that it is the same used on Altix; this one has in
> the Makefile EXTRAVERSION = -sgi306rp31. So again, all prototypes of
> the sys_sched_setaffinity function that I've seen so far have 3
> args... which means that no compiler tricks are needed to keep 3
> different copies of the function.
>
>> one finds that some of them differ in what they will accept for the
>> len.
>
> OK, so this is a different problem...
>
>> Some produce EINVAL if len!=sizeof(long),
>
> I beg to disagree. All the codes that I looked at test for
>
> len < sizeof(new_mask)
>
> and copy user data based on the size of new_mask, so if "len" is
> larger than sizeof(new_mask), no error occurs.
>
>> others (especially Altix) produce EINVAL if len is too short to
>> cover all the machine's CPUs.
>
> ...so IMHO this test should be used instead to separate a long from a
> (larger) cpumask_t.
>
> In the message that described your implementation you also wrote:
>
>> while on other kernels I find that a too-short mask is padded w/
>> zeros and no error results. So, we want a big value for len
>
> Indeed some (more recent) kernels pad with zeros if "len" is too
> short. But a "big value for len" is again wrong.
>
> I can see 4 cases, again by looking at the kernel code and not dealing
> with 2 vs. 3 args:
>
> 1. tests for len < sizeof(long) and copies only sizeof(len) if larger
> (backported 2.4 in RHEL3); this can be identified by passing "len"
> smaller than sizeof(long) which returns -EINVAL and then passing "len"
> of (or larger than) sizeof(long) which should not return error.
>
> 2. tests for len < sizeof(cpumask_t) and copies only sizeof(len) if
> larger (backported 2.4 from SGI, 2.6.3 from Mandrake 10.0); this can
> be identified by passing "len" shorter than sizeof(cpumask_t) which
> returns -EINVAL and then passing "len" of (or larger than)
> sizeof(cpu_size_t) which should not return error.
>
> 3. tests for len < sizeof(cpumask_t) and pads with zeros if true,
> otherwise copies only sizeof(cpumask_t) (2.6.9 in RHEL4 and 2.6.14).
> This can't really be identified as it doesn't return -EINVAL in any
> situation.
>
> As you can see your suggestion to set "big value for len" would
> successfully pass _all_ of the above conditions and would therefore
> not offer any separation between the cases.
>
> The stuff above applies to the _set function; the _get function is a
> bit different:
>
> 1. tests for len < sizeof(long) and returns -EINVAL if true.
> (backported 2.4 in RHEL3). This can be identified by passing "len"
> smaller than sizeof(long) which returns -EINVAL and then passing "len"
> of (or larger than) sizeof(long) which should not return error.
>
> 2. tests for len < sizeof(cpumask_t) and returns -EINVAL if true.
> (backported 2.4 from SGI, 2.6.3 from Mandraks 10.0, 2.6.9 from RHEL4,
> 2.6.14). This can be identified by passing "len" smaller than
> sizeof(cpumask_t) which returns -EINVAL and then passing "len" of (or
> larger than) sizeof(cpumask_t) which should not return error.
>
> Case 1. of _set is associated to case 1. of _get.
> Cases 2. and 3. of _set are both associated to case 2. of _get.
>
> So IMHO the test should be made with the _get function (as explained
> in a previous message), by setting len=sizeof(long) which would allow
> the case 1. to work fine, while case 2. would return -EINVAL, exactly
> opposite from the code that you proposed.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/