On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote:
> I'll try, but sometimes these things are hard to reproduce and I have
> to wait for free nodes to do the test.
> If I do manage to reproduce the
> issue (I've added ERR= traps, so would have to regress) any thing else
> to look at?
You might want to write up a trivial fortran example outside of your main app -- a 10-20 line app that explicitly reads past the end of a trivial file in one MPI process while all the other processes are waiting in an MPI_Barrier, or somesuch. That way you could test this easily even on 1 node, and not have to regress your source, etc.
I think counting the processes should be sufficient. But with a small/trivial test like described above, you might even want to put in some extra print* statements, just to verify exactly where the process stopped, whether it actually exited, etc.