This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
>> 1. Driver doesn't support the HCA - If I remember correct , RH40 by
>> default doesn't support ConnectX hca . The device_list will be empty.
>> It is very exotic case.
>> 2. Driver version doesn't correspond with fw version
>> 3. FW was broken
>> 4. Driver was broken and failed to start - it is not very exotic case
>> too. Some times user make some modification - upgrade/install/etc..
>> and it brakes driver.
>>> In such cases, the ibv_devinfo(1) and ibv_devices(1) commands would
>>> show the same error.
>> Yep these utilities will show the same error.
>> Cases 1-2-3 we may cover pretty simple. OPENIB driver creates
>> "/dev/infiniband" during his startup. So if /dev/infiniband exists
>> and _get_device_list() is empty we may print warning.
> Ok, that seems reasonable.
>> I don't know how we can cover case 4 :-(
> If the user makes modifications to the driver and breaks it, I don't
> think we can be held responsible for that -- prudence declares that
> you should verify that your [self-modified] driver is not broken first
> before blaming Open MPI. I'm not that concerned about #4; most of my
> customers do not modify the drivers.
Agree about #4.
The check for /dev/infiniband should be simple and I think we can add it
to 1.3 .