Looks like it works.
Le 6 mars 08 à 10:36, Ralph Castain a écrit :
> I believe I have at least helped reduce this with r17761. I added the
> ability for procs to detect that their "lifeline" connection (either
> the HNP
> for unity routed, or their local daemon for tree) has been lost and
> gracefully abort.
> Let me know if that helps
> On 3/4/08 9:37 PM, "Aurélien Bouteiller" <bouteill_at_[hidden]>
>> I noticed that the new release of orte is not as good as it used to
>> to cleanup the mess left by crashed/aborted mpi processes. Recently
>> have been experiencing a lot of zombie or live locked processes
>> running on the cluster nodes and disturbing following experiments. I
>> didn't really had time to investigate the issue, maybe ralph can
>> set a
>> ticket if he is able to reproduce this.
>> * Dr. Aurélien Bouteiller
>> * Sr. Research Associate at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 350
>> * Knoxville, TN 37996
>> * 865 974 6321
>> devel mailing list
> devel mailing list