On May 21, 2008, at 3:38 PM, Jeff Squyres wrote:
>> It would be great if libibverbs could return two different error
>> messages
>> - one for "there's no IB card in this machine" and one for "there's
>> an IB
>> card here, but we can't initialize it". I think that would make this
>> argument go away. Open MPI could probably mimic that behavior by
>> parsing
>> the PCI tables, but that sounds ... painful.
Thinking about this a bit more -- I think it depends on what kind of
errors you are worried about seeing. IBV does separate the discovery
of devices (ibv_get_device_list) from trying to open a device
(ibv_open_device). So hypothetically, we *can* distinguish between
these kinds of errors already.
Do you see devices that are so broken that they don't show up in the
list returned from ibv_get_device_list?
FWIW: the *only* case I'm talking about changing the default for is
when ibv_get_device_list returns an empty list (meaning that according
to the verbs stack, there are no devices in the host). I think that
we should *always* warn for any kinds of errors that occur after that
(e.g., we find a device but can't open it, we find one or more devices
but no active ports, etc.).
--
Jeff Squyres
Cisco Systems
|