Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fake Modex
From: Hugo Meyer (meyer.hugo_at_[hidden])
Date: 2011-06-02 09:52:52


Hello again.

My actual problem is that i don't know where is the struct that has the
information that is used to send messages to the procs.

Something like:

Rank URI
0 21222:tcp:192.168.1.1:1250
1 21223:tcp:192.168.1.2:1250
..... .....

Because what i need is to update it when i move a process from its original
site, is there something like this??

Thanks a lot.

Hugo

2011/5/31 Hugo Meyer <meyer.hugo_at_[hidden]>

> Hello @ll.
>
> I'm needing some help to restart the communication with a process that i
> restore in a different node. My situation is as follows:
>
> The process fails and it's restored in another node succesfully from a
> previous checkpoint that i sent there. Now, when a process try to send a
> message to this restored process it will fail, or at least, it will be
> locked in *ompi_request_wait_completion. *
> *
> *
> So, when this happens i have to send a message to the daemon of the sender
> that will have the uri of where the process has been restored and answer to
> the proc with this and it will update this info.
>
> So, i need to know where in the code i can capture this attempt to send and
> then send the message to his daemon and where and how i can update this info
> to send the message to the right place (Same rank but new uri).
>
> I have to do it in this way to avoid a collective communication.
>
> If you give me a hand on this, it will be great.
>
> Best regards.
>
> Hugo
>