Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106
From: George Bosilca (bosilca_at_[hidden])
Date: 2012-03-09 13:14:40


On Mar 9, 2012, at 12:59 , Nathan Hjelm wrote:

> Not exactly, the PML invokes the mpool which invokes the registration function. If registration fails the mpool will deregister from its lru (if possible) and try again. So, it is not an error if ibv_reg_mr fails unless it fails because the process is starved of registered memory (or truely run out).
>
> The hang occurs because there is nothing on the lru to deregister and ibv_reg_mr (or GNI_MemRegister in the uGNI case) fails. The PML then puts the request on its rdma pending list and continues. If any message comes in the rdma pending list is progressed. If not it hangs indefinitely!

Unlike Jeff, I'm not in favor of adding bandages. If the cause is understood, then there _is_ a fix, and that fix should be the target of any efforts.

> In general I have found the underlying cause of the hang is due to an imbalance of registrations between processes on a node. i.e the hung process has an empty lru but other processes could deregister. I am working on a new mpool (grdma) to handle the imbalance. The new mpool will allow a process to request that one of its peers deregisters from it lru if possible. I have a working proof of concept implementation that uses a posix shmem segment and a progress function to handle signaling and dereferencing. With it I no longer see hangs with IMB Alltoall/Alltoallv on uGNI (without putting an artificial limit on the number of registrations). I will test the mpool on infiniband later today.

If a solution already exists I don't see why we have to have the message code. Based on its urgency, I'm confident your patch will make its way into the 1.5 quite easily.

  george.

>
> -Nathan
>
> On Fri, 9 Mar 2012, Jeffrey Squyres wrote:
>
>> George --
>>
>> I believe that this is the subject of a few long-standing tickets (i.e., what to do when running out of registered memory -- right now, we hang, for a few reasons). I think that this is Mellanox's attempt to at least warn the user that we have run out of registered memory, and will therefore hang.
>>
>> Once the hangs have been fixed, I'm assuming this message can be removed.
>>
>> Note, too, that this is in the BTL registration code (openib_reg_mr), not in the directly-invoked-by-the-PML code. So it's the mpool's fault -- not the PML's fault.
>>
>>
>>
>> On Mar 6, 2012, at 10:05 AM, George Bosilca wrote:
>>
>>> I din't check thoroughly the code, but OMPI_ERR_OUT_OF_RESOURCES is not an error. If the registration returns out of resources, the BTL will returns OUT_OF_RESOURCE (as an example via the mca_btl_openib_prepare_src). At the upper level, the PML (in the mca_pml_ob1_send_request_start function) intercept it and insert the request into a pending list. Later on this pending list will be examined and the request for resource re-issued.
>>>
>>> Why do we need to trigger a BTL_ERROR for OUT_OF_RESOURCES?
>>>
>>> george.
>>>
>>> On Mar 6, 2012, at 09:48 , Jeffrey Squyres wrote:
>>>
>>>> Mike --
>>>>
>>>> I would make this a bit better of an error. I.e., use orte_show_help(), so you can explain the issue more, and also remove all duplicates (i.e., if it fails to register multiple times).
>>>>
>>>>
>>>> On Mar 6, 2012, at 8:25 AM, miked_at_[hidden] wrote:
>>>>
>>>>> Author: miked
>>>>> Date: 2012-03-06 09:25:56 EST (Tue, 06 Mar 2012)
>>>>> New Revision: 26106
>>>>> URL: https://svn.open-mpi.org/trac/ompi/changeset/26106
>>>>>
>>>>> Log:
>>>>> print error which is ignored on upper layer
>>>>> Text files modified:
>>>>> trunk/ompi/mca/btl/openib/btl_openib_component.c | 2 ++
>>>>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>>>>
>>>>> Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c
>>>>> ==============================================================================
>>>>> --- trunk/ompi/mca/btl/openib/btl_openib_component.c (original)
>>>>> +++ trunk/ompi/mca/btl/openib/btl_openib_component.c 2012-03-06 09:25:56 EST (Tue, 06 Mar 2012)
>>>>> @@ -569,6 +569,8 @@
>>>>> openib_reg->mr = ibv_reg_mr(device->ib_pd, base, size, access_flag);
>>>>>
>>>>> if (NULL == openib_reg->mr) {
>>>>> + BTL_ERROR(("%s: error pinning openib memory errno says %s",
>>>>> + __func__, strerror(errno)));
>>>>> return OMPI_ERR_OUT_OF_RESOURCE;
>>>>> }
>>>>>
>>>>> _______________________________________________
>>>>> svn-full mailing list
>>>>> svn-full_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> jsquyres_at_[hidden]
>>>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel