Open MPI logo

MTT Devel Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all MTT Devel mailing list

Subject: Re: [MTT devel] [MTT svn] GIT: MTT branch master updated. 016088f2a0831b32ab5fd6f60f4cabe67e92e594
From: Mike Dubman (miked_at_[hidden])
Date: 2014-06-23 09:48:10


btw, i think now, when parent process is killed before child, OS makes
child as "<defunct>" which stick around for good.

On Mon, Jun 23, 2014 at 4:11 PM, Mike Dubman <miked_at_[hidden]>
wrote:

> it seems that mpirun got no signal (no evidence in the log). mtt was
> spinning and mpirun was a only process who left on the node.
> It was unclear why mtt did not kill mpirun.
> will try to extract perl stacktrace from mtt on tomorrow`s nightly run.
>
>
> On Mon, Jun 23, 2014 at 2:59 PM, Jeff Squyres (jsquyres) <
> jsquyres_at_[hidden]> wrote:
>
>> On Jun 23, 2014, at 7:47 AM, Mike Dubman <miked_at_[hidden]>
>> wrote:
>>
>> > after patch, it killed child processes but kept mpirun ... itself.
>>
>> What does that mean -- are you saying that mpirun is still running? Was
>> mpirun sent a signal at all? What kind of messages are being displayed?
>> ...etc.
>>
>> The commits fix important bugs for me and others. Clearly, there's still
>> something not right. And of course I'm willing to track it down. But I
>> can't help you if you just say "it doesn't work."
>>
>> > before that patch - all processes were killed (and you are right,
>> "mpirun died right at the end of the timeout" was reported)
>>
>> ...which led to many months of misleading ORTE debugging, BTW. :-\
>> That's why this commit was introduced into MTT -- in the quest of finally
>> fixing both the mysterious ORTE hangs and the erroneous timeouts/"mpirun
>> died right at the end" messages.
>>
>> > but at least it left the cluster in the clean state w/o leftovers.
>> > now many "orphan" launchers are alive from previous invocations.
>>
>> Does "launchers" = mpirun?
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> mtt-devel mailing list
>> mtt-devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/mtt-devel/2014/06/0629.php
>>
>
>