If you don't mind I would like to understand this issue a little bit more. What exactly is broken in the termination detection?
>From a network point of view, there is a slight issue with the commit 25245. A direct call to exit will close all pending sockets, with a linger of 60 seconds (quite bad if you use static ports as an example). There are proper protocols to shutdown sockets in a reliable way, maybe it is time to implement one of them.
On Oct 10, 2011, at 12:40 , Ralph Castain wrote:
> It wasn't the launcher that was broken, but termination detection, and not for all environments (e.g., worked fine for slurm). It is a progress-related issue.
> Should be fixed in r25245.
> On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote:
>> + 1 , I see the same issue.
>>> -----Original Message-----
>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>>> On Behalf Of Yevgeny Kliteynik
>>> Sent: Monday, October 10, 2011 10:24 AM
>>> To: OpenMPI Devel
>>> Subject: [OMPI devel] Launcher in trunk is broken?
>>> It looks like the process launcher is broken in the OMPI trunk:
>>> If you run any simple test (not necessarily including MPI calls) on 4 or
>>> more nodes, the MPI processes won't be killed after the test finishes.
>>> $ mpirun -host host_1,host_2,host_3,host_4 -np 4 --mca btl sm,tcp,self
>>> And test is hanging......
>>> I have an older trunk (r25228), and everything is OK there.
>>> Not sure if it means that something was broken after that, or the problem
>>> existed before, but kicked in only now due to some other change.
>>> -- YK
>>> devel mailing list
>> devel mailing list
> devel mailing list