OK, but trivial codes don't always reproduce problems.
Is strace useful?
On Fri, Jan 29, 2010 at 7:32 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Jan 29, 2010, at 8:23 AM, Laurence Marks wrote:
>> I'll try, but sometimes these things are hard to reproduce and I have
>> to wait for free nodes to do the test.
>> If I do manage to reproduce the
>> issue (I've added ERR= traps, so would have to regress) any thing else
>> to look at?
> You might want to write up a trivial fortran example outside of your main app -- a 10-20 line app that explicitly reads past the end of a trivial file in one MPI process while all the other processes are waiting in an MPI_Barrier, or somesuch. Â That way you could test this easily even on 1 node, and not have to regress your source, etc.
> I think counting the processes should be sufficient. Â But with a small/trivial test like described above, you might even want to put in some extra print* statements, just to verify exactly where the process stopped, whether it actually exited, etc.
> Jeff Squyres
> users mailing list
Department of Materials Science and Engineering
MSE Rm 2036 Cook Hall
2220 N Campus Drive
Evanston, IL 60208, USA
Tel: (847) 491-3996 Fax: (847) 491-7820
email: L-marks at northwestern dot edu
Chair, Commission on Electron Crystallography of IUCR
Electron crystallography is the branch of science that uses electron
scattering and imaging to study the structure of matter.