Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Retrying a MPI_SEND
From: George Bosilca (bosilca_at_[hidden])
Date: 2011-12-16 10:15:15


Setting the error handler to MPI_ERRORS_RETURN is the right solution for mechanism using the PMPI interface. Hugo is one software layer below the MPI interface, so the error handler is not affecting his code. However, once he reacts to an error, he should reset the error (in the status attached to the request) to MPI_SUCCESS, in order to avoid triggering the error handler on the way back to the MPI layer.

  george.

On Dec 16, 2011, at 09:09 , Jeff Squyres wrote:

> I'm jumping into the middle of this conversation and probably don't have all the right context, so forgive me if this is a stupid question: did you set MPI_ERRORS_RETURN on the communicator in question?
>
>
> On Dec 14, 2011, at 10:43 AM, Hugo Daniel Meyer wrote:
>
>> Hello George and @ll.
>>
>> Sorry for the late answer, but i was doing some trace to see where is set the MPI_ERROR. I took a look to ompi_request_default_wait and try to see what happen with request.
>>
>> Well, i've noticed that all requests that are not inmediately solved go to ompi_request_wait_completion. But i don't know exactly where the execution jumps when i inject a failure to the receiver of the message. After the failure, the sender does not return from ompi_request_wait_completion to ompi_request_default_wait, and i don't know where to catch when the req->req_status.MPI_ERROR is set. Do you know where jumps the execution? or at least in which error handler?
>>
>> Thanks in advance.
>>
>> Hugo
>>
>> 2011/12/9 George Bosilca <bosilca_at_[hidden]>
>>
>> On Dec 9, 2011, at 06:59 , Hugo Daniel Meyer wrote:
>>
>>> Hello George and all.
>>>
>>> I've been adapting some of the code to copy the request, and now i think that it is working ok. I'm storing the request as you do on the pessimist, but i'm only logging received messages, as my approach is a pessimist log based on the receiver.
>>>
>>> I do have a question about how you detect when you have to resend a message, or at least repost it?
>>
>> The error in the status attached to the request will be set in case of failure. As the MPI error handler is triggered right before returning above the MPI layer, at the level where you placed your interception you have all the freedom you need to handle the faults.
>>
>> george.
>>
>>>
>>> Thanks for the help.
>>>
>>> Hugo
>>>
>>> 2011/11/19 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
>>>
>>>
>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>>
>>> On Nov 18, 2011, at 11:50 , Hugo Daniel Meyer wrote:
>>>
>>>>
>>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>>>
>>>> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
>>>>
>>>>> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>>>>>
>>>>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>>>>>
>>>>>> Hello again.
>>>>>>
>>>>>> I was doing some trace into de PML_OB1 files. I start to follow a MPI_Ssend() trying to find where a message is stored (in the sender) if it is not send until the receiver post the recv, but i didn't find that place.
>>>>>
>>>>> Right, you can't find this as the message is not stored on the sender. The pointer to the send request is sent encapsulated in the matching header, and the receiver will provide it back once the message has been matched (this means the data is now ready to flow).
>>>>>
>>>>> So, what you're saying is that the sender only sends the header, so when the receiver post the recv will send again the header so the sender starts with the data sent? am i getting it right? If this is ok, the data stays in the sender, but where it is stored?
>>>>
>>>> If we consider rendez-vous messages the data is remains in the sender buffer (aka the buffer provided by the upper level to the MPI_Send function).
>>>>
>>>> Yes, so i will only need to save the headears of the messages (where the status is incomplete), and then maybe just call again the upper level MP_Send. A question here, the headers are not marked as pending (at least i think so), so, my only approach might be to create a list of pending headers and store there the pointer to the send, then try to identify its corresponding upper level MPI_Send and retries it in case of failure, is this a correct approach?
>>>
>>> Look in the mca/vprotocol/base to see how we deal with the send requests in our message logging protocol. We hijack the send request list, and replace them with our own, allowing us to chain all active requests. This make the tracking of chive requests very simple, and minimize the impact on the overall code.
>>>
>>> george.
>>>
>>>
>>> Ok George.
>>> I will take a look there and then let you know how it goes.
>>>
>>> Thanks.
>>>
>>> Hugo
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel