Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Launcher in trunk is broken?
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-10-10 13:14:33


Ralph,

If you don't mind I would like to understand this issue a little bit more. What exactly is broken in the termination detection?

>From a network point of view, there is a slight issue with the commit 25245. A direct call to exit will close all pending sockets, with a linger of 60 seconds (quite bad if you use static ports as an example). There are proper protocols to shutdown sockets in a reliable way, maybe it is time to implement one of them.

Thanks,
  george.

On Oct 10, 2011, at 12:40 , Ralph Castain wrote:

> It wasn't the launcher that was broken, but termination detection, and not for all environments (e.g., worked fine for slurm). It is a progress-related issue.
>
> Should be fixed in r25245.
>
>
> On Oct 10, 2011, at 8:33 AM, Shamis, Pavel wrote:
>
>> + 1 , I see the same issue.
>>
>>> -----Original Message-----
>>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>>> On Behalf Of Yevgeny Kliteynik
>>> Sent: Monday, October 10, 2011 10:24 AM
>>> To: OpenMPI Devel
>>> Subject: [OMPI devel] Launcher in trunk is broken?
>>>
>>> It looks like the process launcher is broken in the OMPI trunk:
>>> If you run any simple test (not necessarily including MPI calls) on 4 or
>>> more nodes, the MPI processes won't be killed after the test finishes.
>>>
>>> $ mpirun -host host_1,host_2,host_3,host_4 -np 4 --mca btl sm,tcp,self
>>> /bin/hostname
>>>
>>> Output:
>>> host_1
>>> host_2
>>> host_3
>>> host_4
>>>
>>> And test is hanging......
>>>
>>> I have an older trunk (r25228), and everything is OK there.
>>> Not sure if it means that something was broken after that, or the problem
>>> existed before, but kicked in only now due to some other change.
>>>
>>> -- YK
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> hxxp://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel