On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc_at_[hidden]> wrote:
> Hmm....if you are willing to keep trying, could you perhaps let it run for
> a brief time, ctrl-z it, and then do an ls on a directory from a process
> that has already terminated? The pids will be in order, so just look for an
> early number (not mpirun or the parent, of course).
> It would help if you could give us the contents of a directory from a child
> process that has terminated - would tell us what subsystem is failing to
> properly cleanup.
Ok, so I Ctrl-Z the master. In
/tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0 I now have only one
I can't find that PID though. mpirun has PID 4230, orted does not exist,
master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again,
slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there
are 70 sequentially numbered directories starting at 0. Every directory
contains another directory called "0". There is nothing in any of those
directories. I see for instance:
/tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh 70
drwx------ 2 nbock users 4.0K Dec 2 14:41 0
nbock_at_mujo /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh
I hope this information helps. Did I understand your question correctly?