I have a problem which may or may not be openmpi, but since this list
was useful before with a race condition I am posting.
I am trying to use pbsdsh as a ssh replacement, pushed by sysadmins as
Torque does not know about ssh tasks launched from a task. In a simple
case, a script launches three mpi tasks in parallel,
Task2: NodeB and NodeC
(some cores on each, all handled correctly). Reproducible (but with
different nodes and numbers of cores) Task1 and Task3 work fine, the
mpi task starts on NodeB but nothing starts on NodeC, it appears that
NodeC does not communicate. It does not have to be this it could be
Task1: NodeA NodeB
Task2: NodeC NodeD
Here NodeC will start and it looks as if NodeD never starts anything.
I've also run it with 4 Tasks (1,3,4 work) and if Task2 only uses one
Node (number of cores do not matter) it is fine.
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Chair, Commission on Electron Crystallography of IUCR
Research is to see what everybody else has seen, and to think what
nobody else has thought