On Jul 14, 2008, at 1:17 PM, Sean Hefty wrote:
>> I talked to Sean Hefty about it, but we never figured out a
>> cause or solution. My best guess is that there is something wonky
>> about multiple processes simultaneously interacting with the IBCM
>> kernel driver from userspace; but I don't know jack about kernel
>> stuff, so that's a total SWAG.
> The only reason I can think of why ib_cm_listen() fails is if
> there's a conflict
> with the service_id and/or service_mask from multiple threads. What
> does OMPI
> pass in for these parameters?
The service ID that it uses is its PID and the mask is always 0.
There will only be one call to ib_cm_listen() per device per MPI
Open MPI certainly could be buggy with IBCM, of course -- but it's
fishy that the same exact "mpirun ..." command line works one time and
fails the next (it's kinda random when the problem occurs).