Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Fake Modex
From: Hugo Meyer (meyer.hugo_at_[hidden])
Date: 2011-06-16 08:14:58


Hello.

Thanks for yours answers.

I'ts as you said Josh, i'm trying to do something uncoordinated, and on
demand. What i'm doing now is to put some code in the btl_tcp_endpoint.c and
others file that allows me to change the attempts of communication in the
sockets when a failure occurs. At the moment i reset the values in the
receptor of a message from a restored process, and that is working until the
recv finish, because when the receptor tries to communicate with the sender
it fails because the sender does not have an open socket to accept the
connect i think (am i correct?). So now i will work on that and will give
you some feedback.

Thanks a lot for all your help.

Hugo

2011/6/13 Josh Hursey <jjhursey_at_[hidden]>

> I don't think this will help much, but I can tell you how we handled
> this for the coordinated C/R functionality.
>
> When we added automatic recovery and process migration using
> coordinated checkpoints to the Open MPI trunk (spring/summer 2010) we
> were able to take advantage of the coordinated nature of the activity.
> Since all processes were doing the recovery together (with possibly
> only a subset of the processes actually restarting - in the case of
> process migration) we were able to flush the modex and repost
> connection information to all processes that wanted it. The restarted
> processes will pull the updated modex information, and the existing
> processes (if any) will pull the modex information from the restarted
> processes once it is posted. The coordinated nature of the recovery
> activity made it easy to define a point in time in which the modex was
> accurate - similar to MPI_Init.
>
> It sounds like you are trying to do something less coordinated in
> nature. So you will most likely need to extend the modex, since I do
> not think it has good support for sending updated contact information
> (and invalidating old contact information) in the current trunk.
>
> George should know this code path better than I do, so he might be
> able to help a bit more. For their uncoordinated C/R approach they
> would have had to deal with this when restarting processes mid-run
> without halting other processes. So maybe you can use a similar
> approach.
>
> -- Josh
>
>
> On Sat, Jun 4, 2011 at 10:55 AM, Ralph Castain <rhc_at_[hidden]> wrote:
> >
> > On Jun 4, 2011, at 5:21 AM, Hugo Meyer wrote:
> >
> > Thanks for your replies.
> >>After doing that, the MPI_Init procedure calls grpcomm.modex to
> distribute
> >> the data across all procs in the job. Unfortunately, being a collective,
> all
> >> procs must participate. In your case, you'll have to find a different
> way to
> >> do it. Upon receipt, each proc updates its own modex db to include the
> new
> >> info.
> >>Look in orte/mca/grpcomm/bad/grpcomm_bad_module.c at the modex function
> and
> >> follow that code thru the grpcomm/base functions to see how the modex
> info
> >> is retrieved, passed, and decoded on the far end.
> > I will take a look to this Ralph and let you know how it goes. But today
> > looking at the code with a partner, he suggested to me to try to capture
> an
> > error when sending data through the btl_tcp_endpoint, more precisely
> > in mca_btl_tcp_frag_send and capture there an error when we try to write
> to
> > the fd of the socket. I've tried this but when a process moves and try to
> > send a message, or someone try to send a message for him, i cannot
> capture
> > the moment of the failure in the mca_btl_tcp_frag_send, but i don't know
> > why, it is supposed to fail when someone try to send, is there any other
> > place where this is capture? If i do in this way, i can reset connections
> on
> > demand i suppose. What do you think of this? it's a good idea? And after
> i
> > detect this failure, i will try to update de modex db of that process
> from
> > here it's ok?
> >
> > I'm no expert on the tcp btl - perhaps George can answer?
> > The run-time has no visibility into MPI connections, and has no
> > understanding of the modex contents. So if a proc detects that it cannot
> > make the btl connection, I guess it could send an orte message to the
> proc
> > it's trying to reach, and have that proc return a copy of its modex data?
> > I guess that could work. You may be running into the MPI layer's own
> > attempts to ensure comm success via retry...I know you won't get a send
> > failure just because the socket is closed - it'll keep retrying the
> > connection for awhile before giving up.
> >
> >
> > Thanks
> > Hugo
> >
> >
> > 2011/6/3 Jeff Squyres <jsquyres_at_[hidden]>
> >>
> >> On Jun 3, 2011, at 10:12 AM, Ralph Castain wrote:
> >>
> >> > When an MPI proc calls MPI_Init, each btl pushes its contact info into
> >> > the modex database - one example is the btl.tcp.1.7 info you found
> there.
> >> > That entry is for the TCP btl, which is probably what you are looking
> for.
> >> > There is no way for you to edit that data - each btl encodes it in its
> own
> >> > way and then adds it to the modex.
> >>
> >> More specifically, whatever each entity puts into the modex is a blob
> that
> >> is only readable by other entities just like itself. For example, what
> one
> >> TCP BTL puts in the modex can really only be read by another TCP BTL.
> The
> >> contents of what the TCP BTL puts in there is an opaque binary blob from
> the
> >> modex's point of view.
> >>
> >> --
> >> Jeff Squyres
> >> jsquyres_at_[hidden]
> >> For corporate legal information go to:
> >> http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
>
>
>
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>