$NSLOTS is what requested by -pe openmpi <ARG> in the script, my understanding that by default it is threads.
No - it is the number of processing elements (typically cores) that are assigned to your job.
$NSLOTS processes each spinning -t <ARG> threads is not what is wanted as each process could spin off more threads then there are physical or logical cores per node, thus degrading performance or even crashing the node. Even when -t <ARG. is kept within permissive boundaries (2, 4, or 6 cores per processor or 2, 4, 8, or 12 cores per node), it is still not clear how these cores are utilized in multithreaded runs.
My question is then - how to correctly formulate resource scheduling for programs designed to run in multithreaded mode? For those involved in bioinformatics, examples are bwa with -t <ARG> option or blast+ with number_of_threads <ARG> option specified.
What you want to do is:
1. request a number of slots = the number of application processes * the number of threads each process will run
2. execute mpirun with the --cpus-per-proc N option, where N = the number of threads each process will run.
This will ensure you have one core for each thread. Note, however, that we don't actually bind a thread to the core - so having more threads than there are cores on a socket can cause a thread to bounce across sockets and (therefore) potentially across NUMA regions.
users mailing list