Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BTL add procs errors
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-02 13:38:33


Yes, I think the mmap code in the sm btl actually has a sync point inside add_procs that when the root allocs and sets up the area, it'll locally broadcast a "yes, we're good -- mmap attach and let's continue" or "bad things happened; sm btl is broke" message.

But I am not confident about the other BTLs.

On Jun 2, 2010, at 12:51 PM, Eugene Loh wrote:

> George Bosilca wrote:
>
> > We did assume that at least the errors are symmetric, i.e. if A fails
> > to connect to B then B will fail when trying to connect to A.
>
> I've not been following this thread closely, but thought I'd add a comment.
>
> It used to be that the sm BTL could fail asymmetrically. A shared
> memory could be allocated and processes start to allocate resources
> within shared memory. At some point, the shared area would be
> exhausted. So, some processes were set up to communicate to others, but
> the others would not be able to communicate back via the same BTL. I
> think this led to much brokenness. (E.g., how would a process return a
> sm fragment to a sender?)
>
> At this point, my recollection of those issues is very fuzzy.
>
> In any case, I think those issues went away with the shared-memory work
> I did a while back. The size of the area is now computed to be large
> enough that each process's initial allocation would succeed.
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/