On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote:
Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to see the output of that so we can understand what it thinks the topology is like as this underpins the binding code.
The -nooversubscribe option is a red herring here - it has nothing to do with the problem, nor will it help.
FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your process on any specific core at all - we are simply launching it on the node. It sounds to me like your code is incorrectly identifying "sharing" when a process isn't bound to a specific core.
+1
Put differently: if you're not binding your processes to processor cores, then it's quite likely/possible that multiple processes *are* running on the same processor cores, at least intermittently, because the OS is allowed to migrate processes to whatever processor cores it wants to.
However, Kyle mentioned previously that he was doing a -bind-to-core
option. I would suggest adding -report-bindings to the mpirun
command line and see what mpirun really thinks it is binding to if
it is at all.