On Nov 18, 2011, at 11:14 , Hugo Daniel Meyer wrote:

2011/11/18 George Bosilca <bosilca@eecs.utk.edu>

On Nov 18, 2011, at 07:29 , Hugo Daniel Meyer wrote:

Hello again.

I was doing some trace into de PML_OB1 files. I start to follow a MPI_Ssend() trying to find where a message is stored (in the sender) if it is not send until the receiver post the recv, but i didn't find that place. 

Right, you can't find this as the message is not stored on the sender. The pointer to the send request is sent encapsulated in the matching header, and the receiver will provide it back once the message has been matched (this means the data is now ready to flow).

So, what you're saying is that the sender only sends the header, so when the receiver post the recv will send again the header so the sender starts with the data sent? am i getting it right?  If this is ok, the data stays in the sender, but where it is stored?

If we consider rendez-vous messages the data is remains in the sender buffer (aka the buffer provided by the upper level to the MPI_Send function).


I've noticed that the message to be sent enters in mca_pml_ob1_rndv_completion_request(pml_ob1_sendreq.c) and  the rc = send_request_pml_complete_check(sendreq) returns false when the request hasn't been completed, but the execution never passes through MCA_PML_OB1_PROGRESS_PENDING, at least, none of the possible options is executed.

So, re-orienting my question: where is stored this message until delivery? and if there any way to know that the receiver goes down? With this information i will be able to detect the failure of the receiver and will try to resend the message to another place.

If you want to track the send requests, you will have to implement your own way of tracking them, as we do not expose this in our PML. Eventually, writing your own PML, might be necessary.

However, as a user I would find very disturbing that the MPI runtime decide to send the message to another peer on my behalf. I would rather prefer that the MPI_Send returns some kind of error, that allows the upper level algorithm to repost the send to another peer. Look at the proposals in the MPI Forum to get more information about what it is discussed regarding the MPI resilience.

Do you mean a fault tolerant algorithm made by the user? 
What i'm trying to do is a transparent fault tolerant system, where if a failure occurs the system avoid sending informartion to the user, and take actions by itself. For example, if the app tries to contact rank 1, but that rank has failed, so my system will restore the process with rank 1 in another place and make the send to the new location. That's why i need to detect this send failure, update my endpoint with the new location, and retry the send. My big problem right now is to detect this send failure, because i don't know how to obtain the status of a send, or the break of an endpoint (i really don't know what gets broken when a process dies, considering the send ).

What is the difference between this and a message logging approach?

  george.


Right now, i've an implementation that make independant checkpoints of the processes and if i kill one process it gets restarted in another node and continue with its execution. If a send to the restarted process is posted after the restart, there is no problem, because i've already updated the endpoint with that process, but, if a send is posted before the restart, and the recv is posted in the receiver after the restart, i've a problem. Any hellp with this?

Thanks in advance.

Hugo

Thanks again.

Hugo Meyer

2011/11/17 Hugo Daniel Meyer <meyer.hugo@gmail.com>
Hello @ll.

I'm doing some changes in the communication framework. Right now i'm working on a "secure" MPI_Send, this send needs to know when an endpoint goes down, and then retry the communication constructing a new endpoint, or at least, overwriting the data of the old endpoint with the new address of the receiver process. Overwriting the data of the endpoint is not a problem anymore, because i've done that before.

For example, if we consider a Master/Worker application, where the master sends data to the workers, and  workers start the computation, then, the master posts a send to the worker1 that fails and get restarted in another node and in his new location the worker1 posts the recv to the master's send. The problem here is that the master post the send when the process was residing in one node, but the process expects the message in another node. I need the sender to realize that the process is now in another node, and retries the communication with a modificated endpoint. Anyone could please tell me where in the send code i can obtain the status of a message that hasn't been send and resend it to a new location. Also i want to know, where can i obtain information about an endpoint fail?.

Thanks in advance.

Hugo

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel