Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Retrying a MPI_SEND
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2011-12-20 12:37:10


Sorry for the delay.
I will try with the MPI_ERRORS_RETURN handler, maybe that is my problem.
Thanks a lot for your help.

I'll let you know how it goes.

Best regards.

Hugo

2011/12/16 George Bosilca <bosilca_at_[hidden]>

> Setting the error handler to MPI_ERRORS_RETURN is the right solution for
> mechanism using the PMPI interface. Hugo is one software layer below the
> MPI interface, so the error handler is not affecting his code. However,
> once he reacts to an error, he should reset the error (in the status
> attached to the request) to MPI_SUCCESS, in order to avoid triggering the
> error handler on the way back to the MPI layer.
>
> george.
>
> On Dec 16, 2011, at 09:09 , Jeff Squyres wrote:
>
> > I'm jumping into the middle of this conversation and probably don't have
> all the right context, so forgive me if this is a stupid question: did you
> set MPI_ERRORS_RETURN on the communicator in question?
> >
> >
> > On Dec 14, 2011, at 10:43 AM, Hugo Daniel Meyer wrote:
> >
> >> Hello George and @ll.
> >>
> >> Sorry for the late answer, but i was doing some trace to see where is
> set the MPI_ERROR. I took a look to ompi_request_default_wait and try to
> see what happen with request.
> >>
> >> Well, i've noticed that all requests that are not inmediately solved go
> to ompi_request_wait_completion. But i don't know exactly where the
> execution jumps when i inject a failure to the receiver of the message.
> After the failure, the sender does not return from
> ompi_request_wait_completion to ompi_request_default_wait, and i don't know
> where to catch when the req->req_status.MPI_ERROR is set. Do you know where
> jumps the execution? or at least in which error handler?
> >>
> >> Thanks in advance.
> >>
> >> Hugo
> >>
> >> 2011/12/9 George Bosilca <bosilca_at_[hidden]>
> >>
> >> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote:
> >>
> >>> Hello George and all.
> >>>
> >>> I've been adapting some of the code to copy the request, and now i
> think that it is working ok. I'm storing the request as you do on the
> pessimist, but i'm only logging received messages, as my approach is a
> pessimist log based on the receiver.
> >>>
> >>> I do have a question about how you detect when you have to resend a
> message, or at least repost it?
> >>
> >> The error in the status attached to the request will be set in case of
> failure. As the MPI error handler is triggered right before returning above
> the MPI layer, at the level where you placed your interception you have all
> the freedom you need to handle the faults.
> >>
> >> george.
> >>
> >>>
> >>> Thanks for the help.
> >>>
> >>> Hugo
> >>>
> >>> 2011/11/19 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
> >>>
> >>>
> >>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
> >>>
> >>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote:
> >>>
> >>>>
> >>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
> >>>>
> >>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
> >>>>
> >>>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
> >>>>>
> >>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
> >>>>>
> >>>>>> Hello again.
> >>>>>>
> >>>>>> I was doing some trace into de PML_OB1 files. I start to follow a
> MPI_Ssend() trying to find where a message is stored (in the sender) if it
> is not send until the receiver post the recv, but i didn't find that place.
> >>>>>
> >>>>> Right, you can't find this as the message is not stored on the
> sender. The pointer to the send request is sent encapsulated in the
> matching header, and the receiver will provide it back once the message has
> been matched (this means the data is now ready to flow).
> >>>>>
> >>>>> So, what you're saying is that the sender only sends the header, so
> when the receiver post the recv will send again the header so the sender
> starts with the data sent? am i getting it right? If this is ok, the data
> stays in the sender, but where it is stored?
> >>>>
> >>>> If we consider rendez-vous messages the data is remains in the sender
> buffer (aka the buffer provided by the upper level to the MPI_Send
> function).
> >>>>
> >>>> Yes, so i will only need to save the headears of the messages (where
> the status is incomplete), and then maybe just call again the upper level
> MP_Send. A question here, the headers are not marked as pending (at least i
> think so), so, my only approach might be to create a list of pending
> headers and store there the pointer to the send, then try to identify its
> corresponding upper level MPI_Send and retries it in case of failure, is
> this a correct approach?
> >>>
> >>> Look in the mca/vprotocol/base to see how we deal with the send
> requests in our message logging protocol. We hijack the send request list,
> and replace them with our own, allowing us to chain all active requests.
> This make the tracking of chive requests very simple, and minimize the
> impact on the overall code.
> >>>
> >>> george.
> >>>
> >>>
> >>> Ok George.
> >>> I will take a look there and then let you know how it goes.
> >>>
> >>> Thanks.
> >>>
> >>> Hugo
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> >
> > --
> > Jeff Squyres
> > jsquyres_at_[hidden]
> > For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>