No problem! Glad we could help, and many thanks for tracking down
some of our bugs.
On Apr 24, 2007, at 5:28 PM, Mostyn Lewis wrote:
> Well, I'm sorry to have caused even a smidgen of grief here.
> I moved aside the *paffinity_linux* module and la and it still
> bound. I was using InfiniPath HCAs and beta software and eventually
> (sigh) a variable to stop the affine - IPATH_NO_CPUAFFINITY.
> So, a
> export IPATH_NO_CPUAFFINITY=1
> $OPENMPI_GCC/bin/mpirun -x IPATH_NO_CPUAFFINITY -np 1 -host s0158 ./
> showed me what I wanted to see:
> 18236:cpi *->0 (f=noaffinity,0,1,2,3)
> This, in the jargon of my utility, says the mask for taskset is 0xf
> and so is not affined and the ->0 says it's on CPU 0.
> The reason all this comes about is I do endless benchmarks for my
> employer and get to use Scali, QuickSilver(SilverStorm), Qlogic
> all the ethernet MPICHes and LAMs (fading fast) - even HP MPI - on
> our racks which have x cores / socket and sometimes we like to use
> our own methodoligies to choose where to bind and in that case need to
> switch off any supplied binding. I really wish the default was no
> binding like OpenMPI with docs that point out the variables but it's
> not always the case.
> Sorry again for any trub,
> On Tue, 24 Apr 2007, Jeff Squyres wrote:
>> On Apr 23, 2007, at 9:22 PM, Mostyn Lewis wrote:
>>> I tried this on a humble PC and it works there.
>>> I see in the --mca mpi_show_mca_params 1 print out that there is a
>>> [bb17:06646] paffinity=
>>> entry, so I expect that sets the value entry back to 0?
>> There should be an mpi_paffinity_alone parameter; that's what drives
>> the whole process.
>>> I'll get to the SLES10 cluster when I can (other people doing
>>> benchmarks) and see what I can. I see there's no stdbool.h there,
>>> so maybe this is an artifact of defining the bool type on an
>>> operton. I'll get back to you when I can.
>> Lack of (bool) shouldn't be a factor. If it is, we have a bug.
>>> The test of boundness was a perl program invoked via system() in a
>>> C MPI program. The /proc/<pid>/stat result shows the CPU you are
>>> bound to (3rd number from the end) and a taskset call gets back the
>>> mask to show if you are bound or not.
>> Hmm. What version kernel do you have? I know there were some issues
>> with this information until recent versions (I confess to not knowing
>> which version the information became stable/reliable, unfortunately).
>> Are you launching under a scheduler, perchance? N1GE may be setting
>> affinity before MPI processes are even launched, for example...?
>> (I'm not too familiar with N1GE -- I'm speculating).
>> There's a simple acid test to see if OMPI is setting the affinity or
>> not: remove the linux paffinity component (assuming you compiled the
>> components as plugins/dynamic shared objects). Go to the OMPI
>> installation directory:
>> There should be 2 files in there named mca_paffinity_linux.*. This
>> is the component that knows how to set processor affinity in Open
>> MPI; if it's not there, Open MPI won't know how to set affinity on
>> your system (and therefore won't). Rename or move these files so
>> that they are not findable, such as:
>> cd $prefix/lib/openmpi
>> mkdir junk
>> mv *paffinity_linux* junk
>> And run your test again. If you're still getting affinity set, then
>> it's not Open MPI that is setting it.
>> Jeff Squyres
>> Cisco Systems
>> users mailing list
> users mailing list