Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r26106
From: Jeffrey Squyres (jsquyres_at_[hidden])
Date: 2012-03-09 11:56:23


George --

I believe that this is the subject of a few long-standing tickets (i.e., what to do when running out of registered memory -- right now, we hang, for a few reasons). I think that this is Mellanox's attempt to at least warn the user that we have run out of registered memory, and will therefore hang.

Once the hangs have been fixed, I'm assuming this message can be removed.

Note, too, that this is in the BTL registration code (openib_reg_mr), not in the directly-invoked-by-the-PML code. So it's the mpool's fault -- not the PML's fault.

On Mar 6, 2012, at 10:05 AM, George Bosilca wrote:

> I din't check thoroughly the code, but OMPI_ERR_OUT_OF_RESOURCES is not an error. If the registration returns out of resources, the BTL will returns OUT_OF_RESOURCE (as an example via the mca_btl_openib_prepare_src). At the upper level, the PML (in the mca_pml_ob1_send_request_start function) intercept it and insert the request into a pending list. Later on this pending list will be examined and the request for resource re-issued.
>
> Why do we need to trigger a BTL_ERROR for OUT_OF_RESOURCES?
>
> george.
>
> On Mar 6, 2012, at 09:48 , Jeffrey Squyres wrote:
>
> > Mike --
> >
> > I would make this a bit better of an error. I.e., use orte_show_help(), so you can explain the issue more, and also remove all duplicates (i.e., if it fails to register multiple times).
> >
> >
> > On Mar 6, 2012, at 8:25 AM, miked_at_[hidden] wrote:
> >
> >> Author: miked
> >> Date: 2012-03-06 09:25:56 EST (Tue, 06 Mar 2012)
> >> New Revision: 26106
> >> URL: https://svn.open-mpi.org/trac/ompi/changeset/26106
> >>
> >> Log:
> >> print error which is ignored on upper layer
> >> Text files modified:
> >> trunk/ompi/mca/btl/openib/btl_openib_component.c | 2 ++
> >> 1 files changed, 2 insertions(+), 0 deletions(-)
> >>
> >> Modified: trunk/ompi/mca/btl/openib/btl_openib_component.c
> >> ==============================================================================
> >> --- trunk/ompi/mca/btl/openib/btl_openib_component.c (original)
> >> +++ trunk/ompi/mca/btl/openib/btl_openib_component.c 2012-03-06 09:25:56 EST (Tue, 06 Mar 2012)
> >> @@ -569,6 +569,8 @@
> >> openib_reg->mr = ibv_reg_mr(device->ib_pd, base, size, access_flag);
> >>
> >> if (NULL == openib_reg->mr) {
> >> + BTL_ERROR(("%s: error pinning openib memory errno says %s",
> >> + __func__, strerror(errno)));
> >> return OMPI_ERR_OUT_OF_RESOURCE;
> >> }
> >>
> >> _______________________________________________
> >> svn-full mailing list
> >> svn-full_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/svn-full
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/