Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Orte cleanup
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-03-07 11:59:05


Looks like it works.

Aurelien

Le 6 mars 08 à 10:36, Ralph Castain a écrit :

> I believe I have at least helped reduce this with r17761. I added the
> ability for procs to detect that their "lifeline" connection (either
> the HNP
> for unity routed, or their local daemon for tree) has been lost and
> gracefully abort.
>
> Let me know if that helps
> Ralph
>
>
>
> On 3/4/08 9:37 PM, "Aurélien Bouteiller" <bouteill_at_[hidden]>
> wrote:
>
>> I noticed that the new release of orte is not as good as it used to
>> be
>> to cleanup the mess left by crashed/aborted mpi processes. Recently
>> We
>> have been experiencing a lot of zombie or live locked processes
>> running on the cluster nodes and disturbing following experiments. I
>> didn't really had time to investigate the issue, maybe ralph can
>> set a
>> ticket if he is able to reproduce this.
>>
>> Aurelien
>> --
>> * Dr. Aurélien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>>
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel