Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: libevent update
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-03-18 16:59:37


George added an MCA parameter for it (opal_event_include is a string
that can be set to "select" or "poll"), but it has to be set before
opal_init().

Josh: could you try running with the MCA parameter opal_event_include
set to "select"? This would confirm Brian's hypothesis...

Given that opal_init() is the first thing that happens in
ompi_mpi_init(), this may not be enough -- you could *detect* that we
can't do BLCR, but this mechanism doesn't allow libmpi to set
something saying "reset libevent to be able to only use select()."

George -- is that hard to add? I would imagine that it could be kinda
difficult to reset libevent after there are already users of it, fd's
and other events that may have been added, etc...?

On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote:

> Jeff / George -
>
> Did you add a way to specify which event modules are used? Because
> epoll
> pushs the socket list into the kernel, I can see how it would screw up
> BLCR. I bet everything would work if we forced the use of poll /
> select.
>
> Brian
>
> On Tue, 18 Mar 2008, Jeff Squyres wrote:
>
>> Crud, ok. Keep us posted.
>>
>> On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:
>>
>>> I'm testing with checkpoint/restart and the new libevent seems to be
>>> messing up the checkpoints generated by BLCR. I'll be taking a look
>>> at it over the next couple of days, but just thought I'd let people
>>> know. Unfortunately I don't have any more details at the moment.
>>>
>>> -- Josh
>>>
>>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:
>>>
>>>> WHAT: Bring new version of libevent to the trunk.
>>>>
>>>> WHY: Newer version, slightly better performance (lower overheads /
>>>> lighter weight), properly integrate the use of epoll and other
>>>> scalable fd monitoring mechanisms.
>>>>
>>>> WHERE: 98% of the changes are in opal/event; there's a few
>>>> changes to
>>>> configury and one change to the orted.
>>>>
>>>> TIMEOUT: COB, Friday, 21 March 2008
>>>>
>>>> DESCRIPTION:
>>>>
>>>> George/UTK has done the bulk of the work to integrate a new version
>>>> of
>>>> libevent on the following tmp branch:
>>>>
>>>> https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge
>>>>
>>>> ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
>>>> BRANCH!
>>>> **
>>>>
>>>> Cisco ran MTT on this branch on Friday and everything checked out
>>>> (i.e., no more failures than on the trunk). We just made a few
>>>> more
>>>> minor changes today and I'm running MTT again now, but I'm not
>>>> expecting any new failures (MTT will take several hours). We would
>>>> like to bring the new libevent in over this upcoming weekend, but
>>>> would very much appreciate if others could test on their platforms
>>>> (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should*
>>>> be a
>>>> fairly side-effect free change, but it is possible that since we're
>>>> now using epoll and other scalable fd monitoring tools, we'll run
>>>> into
>>>> some unanticipated issues on some platforms.
>>>>
>>>> Here's a consolidated diff if you want to see the changes:
>>>>
>>>> https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
>>>> 2Flibevent-merge&old=17846&new_path=trunk&new=17842
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems