Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Retrying a MPI_SEND
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2011-12-14 10:43:16


Hello George and @ll.

Sorry for the late answer, but i was doing some trace to see where is set
the MPI_ERROR. I took a look to ompi_request_default_wait and try to see
what happen with request.

Well, i've noticed that all requests that are not inmediately solved go
to ompi_request_wait_completion. But i don't know exactly where the
execution jumps when i inject a failure to the receiver of the message.
After the failure, the sender does not return
from ompi_request_wait_completion to ompi_request_default_wait, and i don't
know where to catch when the req->req_status.MPI_ERROR is set. Do you know
where jumps the execution? or at least in which error handler?

Thanks in advance.

Hugo

2011/12/9 George Bosilca <bosilca_at_[hidden]>

>
> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote:
>
> Hello George and all.
>
> I've been adapting some of the code to copy the request, and now i think
> that it is working ok. I'm storing the request as you do on the pessimist,
> but i'm only logging received messages, as my approach is a pessimist log
> based on the receiver.
>
> I do have a question about how you detect when you have to resend a
> message, or at least repost it?
>
>
> The error in the status attached to the request will be set in case of
> failure. As the MPI error handler is triggered right before returning above
> the MPI layer, at the level where you placed your interception you have all
> the freedom you need to handle the faults.
>
> george.
>
>
> Thanks for the help.
>
> Hugo
>
> 2011/11/19 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
>
>>
>>
>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>
>>>
>>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote:
>>>
>>>
>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>>
>>>>
>>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
>>>>
>>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>>>
>>>>>
>>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>>>>>
>>>>> Hello again.
>>>>>
>>>>> I was doing some trace into de PML_OB1 files. I start to follow a
>>>>> MPI_Ssend() trying to find where a message is stored (in the sender) if it
>>>>> is not send until the receiver post the recv, but i didn't find that place.
>>>>>
>>>>>
>>>>> Right, you can't find this as the message is not stored on the sender.
>>>>> The pointer to the send request is sent encapsulated in the matching
>>>>> header, and the receiver will provide it back once the message has been
>>>>> matched (this means the data is now ready to flow).
>>>>>
>>>>
>>>> So, what you're saying is that the sender only sends the header, so
>>>> when the receiver post the recv will send again the header so the sender
>>>> starts with the data sent? am i getting it right? If this is ok, the data
>>>> stays in the sender, but where it is stored?
>>>>
>>>>
>>>> If we consider rendez-vous messages the data is remains in the sender
>>>> buffer (aka the buffer provided by the upper level to the MPI_Send
>>>> function).
>>>>
>>>
>>> Yes, so i will only need to save the headears of the messages (where the
>>> status is incomplete), and then maybe just call again the upper level
>>> MP_Send. A question here, the headers are not marked as pending (at least i
>>> think so), so, my only approach might be to create a list of pending
>>> headers and store there the pointer to the send, then try to identify its
>>> corresponding upper level MPI_Send and retries it in case of failure, is
>>> this a correct approach?
>>>
>>>
>>> Look in the mca/vprotocol/base to see how we deal with the send requests
>>> in our message logging protocol. We hijack the send request list, and
>>> replace them with our own, allowing us to chain all active requests. This
>>> make the tracking of chive requests very simple, and minimize the impact on
>>> the overall code.
>>>
>>> george.
>>>
>>>
>> Ok George.
>> I will take a look there and then let you know how it goes.
>>
>> Thanks.
>>
>> Hugo
>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>