I would like to be able to start a non-oversubscribed run of a program in OpenMPI as if it were oversubscribed, so that the processes run in Degraded Mode, such that I have the option to start an additional simultaneous run on the same nodes if necessary.
(Basically, I have a program that will ask for some data, run for a while, then print some results, then stop and ask for more data. It takes some time to collect and input the additional data, so I would like to be able to start another instance of the program which can be running while i'm inputting data to the first instance, and can be inputting while the first instance is running).
Since I have single-processor nodes, the obvious solution would be to set slots=0 for each of my nodes, so that using 1 slot for every run causes the nodes to be oversubscribed. However, it seems that slots=0 is treated like slots=infinity, so my processes run in Aggressive Mode, and I loose the ability to oversubscribe my node using two independent processes.
So, I tried setting '--mca mpi_yield_when_idle 1', since this sounded like it was meant to force Degraded Mode. But, it didn't seem to do anything - my processes still ran in Aggressive Mode. I skimmed through the source code real quick, and it doesn't look like mpi_yield_when_idle is ever actually used.
So, could either slots=0 be changed to really mean slots=0, or could mpi_yield_when_idle be implemented so I can force my processes to run in Degraded Mode?
I also noticed another bug in the scheduler:
A slots=2 max-slots=2
B slots=2 max-slots=2
'mpirun -np 5' quits with an over-subscription error
'mpirun -np 3 --host B' hangs and just chews up CPU cycles forever
And finally, on http://www.open-mpi.org/faq/?category=tuning - 11. How do I tell Open MPI to use processor and/or memory affinity?
It mentions that OpenMPI will automatically disable processor affinity on oversubscribed nodes. When I first read it, I made the assumption that processor affinity and Degraded Mode were incompatible. However, it seems that independent non-oversubscribed processes running in Degraded Mode work fine with processor affinity - it's only actually oversubscribed processes which have problems. A note that Degraded Mode and Processor Affinity work together even though Processor Affinity and oversubscription do not would be nice.
Thanks a ton!