Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: libevent update
From: Paul H. Hargrove (PHHargrove_at_[hidden])
Date: 2008-03-18 20:21:00


After taking a look at how epoll is implemented in the Linyux kernel, I
can say with 100% certainty that BLCR will not restore the epoll fd
correctly. I hope to fix that eventually, but have too many other
things on my plate to address is now.

Since I cannot promise how soon BLCR may be able to resolve this
problem, I suggest that Josh continue exploring the alternatives. At
least "opal_event_include" set to "poll" appears to work. It is not
clear to me if the "select" problem is related to BLCR or not.

I am guessing that I don't get a say as to weather the BLCR/epoll
problems should delay the libevent merge, but I trust the rest of you to
determine what is in the best interest of OMPI.

-Paul

Josh Hursey wrote:
> I have some more data from the field.
>
> Leaving "opal_event_include" unset (Default) BLCR would give me the
> following error when trying to restart a 2 process 'noop' MPI
> application:
> ----------------------------
> shell$ ompi-restart ompi_global_snapshot_8587.ckpt
> Restart failed: Bad file descriptor
> Restart failed: Bad file descriptor
> shell$
> ----------------------------
[snip]

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
HPC Research Department                   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900