On 2 Jun 2010, at 16:49, Jeff Squyres wrote:
> On Jun 2, 2010, at 11:29 AM, Sylvain Jeaugey wrote:
>> But it made me progress on why I'm crashing : in my case, only a subset of
>> processes have their create_cq fail.
> Ah, this is the key. If I have one process (out of many) fail the create_cq() function, I get a segv during finalize. I'll dig.
Is there an assumption that if process A claims to be able to communicate with process B that process B can also communicate with process A. It almost sounds like the code needs to do a allreduce on the bitmask returned by the btls.
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing