They symptom is that the process hangs forever. Its difficult to differentiate this bug and simply running out of registered memory.
The bug is hit if the pml is using the mpi_leave_pinned protocol and the btl returns an error from its send function.
-Nathan
________________________________________
From: devel-bounces_at_[hidden] [devel-bounces_at_[hidden]] on behalf of Christopher Samuel [samuel_at_[hidden]]
Sent: Thursday, March 01, 2012 7:58 PM
To: devel_at_[hidden]
Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r26077 (fwd)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 02/03/12 02:56, Nathan Hjelm wrote:
> Found a pretty nasty frag leak (and a minor one) in ob1 (see
> commit below). If this fix addresses some hangs we are seeing on
> infiniband LANL might want a 1.4.6 rolled (or a faster rollout for
> 1.6.0).
What symptoms would an affected job show? Does it fail with an OMPI
error or does it just hang using 0% CPU?
cheers,
Chris
- --
Christopher Samuel - Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel_at_[hidden] Phone: +61 (0)3 903 55545
http://www.vlsci.unimelb.edu.au/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk9QN10ACgkQO2KABBYQAh9aRgCePZXdzqlI8lpfqWtHf8rtFvup
2D8An3E9y411xTyRBpfwHLPpWTzqUiuv
=3EXP
-----END PGP SIGNATURE-----
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel
|