I have been looking, but I haven't really found a good answer about
system level threading. We are about to get a new cluster of
dual-processor quad-core nodes or 8 cores per node. Traditionally I
would just tell MPI to launch two processes per dual processor single
core node, but with eight cores on a node, having 8 processes seems
inefficient.
My question is this: does OpenMPI sense that there are multiple cores
on a node and use something like pthreads instead of creating new
processes automatically when I request 8 processes for a node, or
should I run a single process per node and use OpenMP or pthreads
explicitly to get better performance on a per node basis?
--
Sam Adams
|