On 4/25/2012 1:00 PM, Jeff Squyres wrote:
On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote:

Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to see the output of that so we can understand what it thinks the topology is like as this underpins the binding code.

The -nooversubscribe option is a red herring here - it has nothing to do with the problem, nor will it help.

FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your process on any specific core at all - we are simply launching it on the node. It sounds to me like your code is incorrectly identifying "sharing" when a process isn't bound to a specific core.

Put differently: if you're not binding your processes to processor cores, then it's quite likely/possible that multiple processes *are* running on the same processor cores, at least intermittently, because the OS is allowed to migrate processes to whatever processor cores it wants to.
However, Kyle mentioned previously that he was doing a -bind-to-core option.  I would suggest adding -report-bindings to the mpirun command line and see what mpirun really thinks it is binding to if it is at all.

There is one piece of information that seems missing and confusing me.  Kyle how is your code determining it is the only process bound to a core or conversely another process is bound to the same core?   

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com