In addition, it would be really, really nice if someone would consolidate the watching of these devices into other mechanisms.
The idea here is that the error can be noticed asynchronously, so it can't be part of the main libevent fd-watching (which is only checked once in a while). The async watcher needs to watch a lot of time. But there's also the RDMA CM / IB CM fd watchers, too. At a minimum, these could be combined. They weren't combined at the time for expediency -- there's no real technical reason that can't be solved why they can't be merged. While the cost of having 2 threads is pretty minimal, having 2 threads (or 3 or ... N threads) instead of 1 does take up a few resources.
Pasha and I never got the time to unify this fd monitoring, and we've now moved on such that it's unlikely that we'll get the opportunity to do it. It would be great if one of the vendors still working in the openib BTL could do this, someday. :-)
Additionally, with the new libevent work occurring, it could be possible to simply have a separate libevent base that handles all of these fds, which would be nice.
On Dec 23, 2010, at 10:28 AM, Shamis, Pavel wrote:
> The async thread is used to handle asynchronous error/notification events, like port up/down, hca errors etc.
> So most of the time the thread sleeps, and in healthy network you not supposed to see any events.
> On Dec 23, 2010, at 12:49 AM, Eugene Loh wrote:
>> I'm starting to look at the openib BTL for the first time and am
>> puzzled. In btl_openib_async.c, it looks like an asynchronous thread is
>> started. During MPI_Init(), the main thread sends the async thread a
>> file descriptor for each IB interface to be polled. In MPI_Finalize(),
>> the main thread asks the async thread to shut down. Between MPI_Init()
>> and MPI_Finalize(), I would think that the async thread would poll on
>> the IB fd's and handle events that come up. If I stick print statements
>> into the async thread, however, I don't see any events come up on the IB
>> fd's. So, the async thread is useless. Yes? It starts up and shuts
>> down, but never sees any events on the IB devices?
>> devel mailing list
> devel mailing list
For corporate legal information go to: