Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Linuxes shipping libibverbs
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-05-28 17:20:05


On May 28, 2008, at 8:02 AM, Jeff Squyres wrote:

> Note that the two /sys checks may be redundant; I'm not entirely sure
> how the two files relate to each other. libibverbs will complain
> about the first if it is not present; the second is used to indicate
> that the kernel drivers are loaded.

I got some more feedback from Roland off-list explaining that if /sys/
class/infiniband does exist and is non-empty and /sys/class/
infiniband_verbs/abi_version does not exist, then this is definitely a
case where we want to warn because it implies that config is screwed
up -- RDMA devices are present but not usable.

In this case, I think the warning that libibverbs itself prints is
suitable ("Fatal: couldn't read..."). So let's just eliminate that
check in OMPI and go with something like the following (pretty much
exactly what was proposed a while ago by Pasha :-) ):

   # If sysfs/class/infiniband does not exist, the driver was not
   # started. Therefore: assume that the user does not want RDMA
   # hardware support -- do *not* print a warning message.
   if (! -d "$sysfsdir/class/infiniband") {
       if ($always_want_to_see_warnings)
           print "Warning: $sysfsdir/class/infiniband does not exist\n";
       return SKIP_THIS_BTL;
   }

   # If we get to this point, the drivers are loaded and therefore we
   # will assume that there is supposed to be at least one RDMA device
   # present. Warn if we don't find any.
   $list = ibv_get_device_list();
   if (empty($list)) {
       print "Warning: couldn't find any RDMA devices -- if you have
no RDMA devices, stop the driver to avoid this warning message\n";
       return SKIP_THIS_BTL;
   }

   # ...continue with initialization; warnings and errors are
   # *always* displayed after this point

-- 
Jeff Squyres
Cisco Systems