-----BEGIN PGP SIGNED MESSAGE-----
At SC10 this year there was an interesting tool presented
as a student paper called "FlowChecker: Detecting Bugs in
MPI Libraries via Message Flow Checking".
Basically they instrument a program and derive "intentions"
from your MPI calls and the MPI standard and also trace the
data flow (including things like memcpy) and messages.Then
offline you run a correlator which compares what was meant
to happen and what did and tries to root cause the fault.
They claim to have taken 5 random closed bugs from 3 different
MPI implementations (including 3 from Open-MPI) and been able
to detect all 5 and root-cause 4 of them (the one they missed
was a data type issue).
The PDF of their paper is here:
I've emailed them to see if the code is going to be available
as it could be quite a handy tool to have when trying to track
down issues like the one Sébastien posted about.
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computational Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
-----END PGP SIGNATURE-----