Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-06-23 07:59:19


On Jun 23, 2014, at 7:47 AM, Mike Dubman <miked_at_[hidden]> wrote:

> after patch, it killed child processes but kept mpirun ... itself.

What does that mean -- are you saying that mpirun is still running? Was mpirun sent a signal at all? What kind of messages are being displayed? ...etc.

The commits fix important bugs for me and others. Clearly, there's still something not right. And of course I'm willing to track it down. But I can't help you if you just say "it doesn't work."

> before that patch - all processes were killed (and you are right, "mpirun died right at the end of the timeout" was reported)

...which led to many months of misleading ORTE debugging, BTW. :-\ That's why this commit was introduced into MTT -- in the quest of finally fixing both the mysterious ORTE hangs and the erroneous timeouts/"mpirun died right at the end" messages.

> but at least it left the cluster in the clean state w/o leftovers.
> now many "orphan" launchers are alive from previous invocations.

Does "launchers" = mpirun?

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/