Yes, I think the mmap code in the sm btl actually has a sync point inside add_procs that when the root allocs and sets up the area, it'll locally broadcast a "yes, we're good -- mmap attach and let's continue" or "bad things happened; sm btl is broke" message.
But I am not confident about the other BTLs.
On Jun 2, 2010, at 12:51 PM, Eugene Loh wrote:
> George Bosilca wrote:
> > We did assume that at least the errors are symmetric, i.e. if A fails
> > to connect to B then B will fail when trying to connect to A.
> I've not been following this thread closely, but thought I'd add a comment.
> It used to be that the sm BTL could fail asymmetrically. A shared
> memory could be allocated and processes start to allocate resources
> within shared memory. At some point, the shared area would be
> exhausted. So, some processes were set up to communicate to others, but
> the others would not be able to communicate back via the same BTL. I
> think this led to much brokenness. (E.g., how would a process return a
> sm fragment to a sender?)
> At this point, my recollection of those issues is very fuzzy.
> In any case, I think those issues went away with the shared-memory work
> I did a while back. The size of the area is now computed to be large
> enough that each process's initial allocation would succeed.
> devel mailing list
For corporate legal information go to: