Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Retrying a MPI_SEND
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2011-11-18 14:50:07


2011/11/18 George Bosilca <bosilca_at_[hidden]>

>
> On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:
>
> 2011/11/18 George Bosilca <bosilca_at_[hidden]>
>
>>
>> On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:
>>
>> Hello again.
>>
>> I was doing some trace into de PML_OB1 files. I start to follow a
>> MPI_Ssend() trying to find where a message is stored (in the sender) if it
>> is not send until the receiver post the recv, but i didn't find that place.
>>
>>
>> Right, you can't find this as the message is not stored on the sender.
>> The pointer to the send request is sent encapsulated in the matching
>> header, and the receiver will provide it back once the message has been
>> matched (this means the data is now ready to flow).
>>
>
> So, what you're saying is that the sender only sends the header, so when
> the receiver post the recv will send again the header so the sender starts
> with the data sent? am i getting it right? If this is ok, the data stays
> in the sender, but where it is stored?
>
>
> If we consider rendez-vous messages the data is remains in the sender
> buffer (aka the buffer provided by the upper level to the MPI_Send
> function).
>

Yes, so i will only need to save the headears of the messages (where the
status is incomplete), and then maybe just call again the upper level
MP_Send. A question here, the headers are not marked as pending (at least i
think so), so, my only approach might be to create a list of pending
headers and store there the pointer to the send, then try to identify its
corresponding upper level MPI_Send and retries it in case of failure, is
this a correct approach?

>
>
>> I've noticed that the message to be sent enters in *
>> mca_pml_ob1_rndv_completion_request(*pml_ob1_sendreq.c*) *and the *rc =
>> send_request_pml_complete_check(sendreq) *returns false when the request
>> hasn't been completed, but the execution never passes through *
>> MCA_PML_OB1_PROGRESS_PENDING,* at least, none of the possible options is
>> executed.
>>
>> So, re-orienting my question: where is stored this message until
>> delivery? and if there any way to know that the receiver goes down? With
>> this information i will be able to detect the failure of the receiver and
>> will try to resend the message to another place.
>>
>>
>> If you want to track the send requests, you will have to implement your
>> own way of tracking them, as we do not expose this in our PML. Eventually,
>> writing your own PML, might be necessary.
>>
>> However, as a user I would find very disturbing that the MPI runtime
>> decide to send the message to another peer on my behalf. I would rather
>> prefer that the MPI_Send returns some kind of error, that allows the upper
>> level algorithm to repost the send to another peer. Look at the proposals
>> in the MPI Forum to get more information about what it is discussed
>> regarding the MPI resilience.
>>
>> Do you mean a fault tolerant algorithm made by the user?
> What i'm trying to do is a transparent fault tolerant system, where if a
> failure occurs the system avoid sending informartion to the user, and take
> actions by itself. For example, if the app tries to contact rank 1, but
> that rank has failed, so my system will restore the process with rank 1 in
> another place and make the send to the new location. That's why i need to
> detect this send failure, update my endpoint with the new location, and
> retry the send. My big problem right now is to detect this send failure,
> because i don't know how to obtain the status of a send, or the break of an
> endpoint (i really don't know what gets broken when a process dies,
> considering the send ).
>
>
> What is the difference between this and a message logging approach?
>

Actually, i use an uncoordinated checkpoint approach + receiver based
message logging. As that message has not been send, and of course, not yet
received, i cannot store it on my log, so what i've to do is retry the send
when i detect the failure because the receive has failed.

>
> george.
>
>
> Right now, i've an implementation that make independant checkpoints of the
> processes and if i kill one process it gets restarted in another node and
> continue with its execution. If a send to the restarted process is posted
> after the restart, there is no problem, because i've already updated the
> endpoint with that process, but, if a send is posted before the restart,
> and the recv is posted in the receiver after the restart, i've a problem.
> Any hellp with this?
>
> Thanks in advance.
>
> Hugo
>
>>
>> Thanks again.
>>
>> Hugo Meyer
>>
>> 2011/11/17 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
>>
>>> Hello @ll.
>>>
>>> I'm doing some changes in the communication framework. Right now i'm
>>> working on a "secure" MPI_Send, this send needs to know when an endpoint
>>> goes down, and then retry the communication constructing a new endpoint, or
>>> at least, overwriting the data of the old endpoint with the new address of
>>> the receiver process. Overwriting the data of the endpoint is not a problem
>>> anymore, because i've done that before.
>>>
>>> For example, if we consider a Master/Worker application, where the
>>> master sends data to the workers, and workers start the computation, then,
>>> the master posts a send to the worker1 that fails and get restarted in
>>> another node and in his new location the worker1 posts the recv to the
>>> master's send. The problem here is that the master post the send when the
>>> process was residing in one node, but the process expects the message in
>>> another node. I need the sender to realize that the process is now in
>>> another node, and retries the communication with a modificated endpoint.
>>> Anyone could please tell me where in the send code i can obtain the status
>>> of a message that hasn't been send and resend it to a new location. Also i
>>> want to know, where can i obtain information about an endpoint fail?.
>>>
>>> Thanks in advance.
>>>
>>> Hugo
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>