$NSLOTS is what requested by -pe openmpi <ARG> in the script, my understanding that by default it is threads. $NSLOTS processes each spinning -t <ARG> threads is not what is wanted as each process could spin off more threads then there are physical or logical cores per node, thus degrading performance or even crashing the node. Even when -t <ARG. is kept within permissive boundaries (2, 4, or 6 cores per processor or 2, 4, 8, or 12 cores per node), it is still not clear how these cores are utilized in multithreaded runs.
My question is then - how to correctly formulate resource scheduling for programs designed to run in multithreaded mode? For those involved in bioinformatics, examples are bwa with -t <ARG> option or blast+ with number_of_threads <ARG> option specified.