Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] IBCM error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-14 16:55:10


On Jul 14, 2008, at 1:17 PM, Sean Hefty wrote:

>> I talked to Sean Hefty about it, but we never figured out a
>> definitive
>> cause or solution. My best guess is that there is something wonky
>> about multiple processes simultaneously interacting with the IBCM
>> kernel driver from userspace; but I don't know jack about kernel
>> stuff, so that's a total SWAG.
>
> The only reason I can think of why ib_cm_listen() fails is if
> there's a conflict
> with the service_id and/or service_mask from multiple threads. What
> does OMPI
> pass in for these parameters?

The service ID that it uses is its PID and the mask is always 0.
There will only be one call to ib_cm_listen() per device per MPI
process.

Open MPI certainly could be buggy with IBCM, of course -- but it's
fishy that the same exact "mpirun ..." command line works one time and
fails the next (it's kinda random when the problem occurs).

-- 
Jeff Squyres
Cisco Systems