Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: libevent update
From: Josh Hursey (jjhursey_at_[hidden])
Date: 2008-03-18 18:54:13


I have some more data from the field.

Leaving "opal_event_include" unset (Default) BLCR would give me the
following error when trying to restart a 2 process 'noop' MPI
application:
----------------------------
shell$ ompi-restart ompi_global_snapshot_8587.ckpt
Restart failed: Bad file descriptor
Restart failed: Bad file descriptor
shell$
----------------------------

If I set "opal_event_include" to "select" then I get a different
message, this one from Open MPI:
----------------------------
shell$ ompi-restart ompi_global_snapshot_8543.ckpt
[warn] select: Bad file descriptor
[odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel-
>dispatch() failed.
[warn] select: Bad file descriptor
[odin001.cs.indiana.edu:18027] opal_event_base_loop: ompi_evesel-
>dispatch() failed.
[warn] select: Bad file descriptor
...
----------------------------
This repeats until I kill the restarted job. I've figured out what is
outputing the error message, but I can't say exactly why at the
moment. Still digging.

If I set "opal_event_include" to "poll" then everything is fine. The
restart works as expected in all scenarios. :)

I'm currently using BLCR 0.6.0 Beta 6 on this machine. I've requested
that BLCR be upgraded on this machine so I can test the latest
version to see if the poll/epoll problem persists. I'll work with
Paul if this turns up anything.

As far as what Open MPI needs to do, I don't think we need to do
anything at the moment. I can add the MCA parameter to the 'ft-enable-
cr' AMCA file which will work as a temporary fix.

Thanks for all your help in tracking this problem.

Cheers,
Josh

On Mar 18, 2008, at 5:19 PM, George Bosilca wrote:

> Its like rewriting libevent from scratch. I guess it can be done,
> but it will be a long and painful process. How about the following
> solution:
>
> - the daemons are aware that the checkpointing is enabled. They can
> set the environment variable which will force the
> opal_event_include to be set to select.
>
> - as the environment variables have a higher priority over the
> configuration file, this will work on most cases (except when the
> user add the mca parameter by hand).
>
> - in the checkpoint/restart code, we can add a test that check the
> value of opal_event_include, print a message if the value is not
> select, and disable the checkpoint/restart functionality.
>
> george.
>
> On Mar 18, 2008, at 4:59 PM, Jeff Squyres wrote:
>
>> George added an MCA parameter for it (opal_event_include is a string
>> that can be set to "select" or "poll"), but it has to be set before
>> opal_init().
>>
>> Josh: could you try running with the MCA parameter opal_event_include
>> set to "select"? This would confirm Brian's hypothesis...
>>
>> Given that opal_init() is the first thing that happens in
>> ompi_mpi_init(), this may not be enough -- you could *detect* that we
>> can't do BLCR, but this mechanism doesn't allow libmpi to set
>> something saying "reset libevent to be able to only use select()."
>>
>> George -- is that hard to add? I would imagine that it could be
>> kinda
>> difficult to reset libevent after there are already users of it, fd's
>> and other events that may have been added, etc...?
>>
>>
>> On Mar 18, 2008, at 4:29 PM, Brian W. Barrett wrote:
>>
>>> Jeff / George -
>>>
>>> Did you add a way to specify which event modules are used? Because
>>> epoll
>>> pushs the socket list into the kernel, I can see how it would
>>> screw up
>>> BLCR. I bet everything would work if we forced the use of poll /
>>> select.
>>>
>>> Brian
>>>
>>> On Tue, 18 Mar 2008, Jeff Squyres wrote:
>>>
>>>> Crud, ok. Keep us posted.
>>>>
>>>> On Mar 18, 2008, at 4:16 PM, Josh Hursey wrote:
>>>>
>>>>> I'm testing with checkpoint/restart and the new libevent seems
>>>>> to be
>>>>> messing up the checkpoints generated by BLCR. I'll be taking a
>>>>> look
>>>>> at it over the next couple of days, but just thought I'd let
>>>>> people
>>>>> know. Unfortunately I don't have any more details at the moment.
>>>>>
>>>>> -- Josh
>>>>>
>>>>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:
>>>>>
>>>>>> WHAT: Bring new version of libevent to the trunk.
>>>>>>
>>>>>> WHY: Newer version, slightly better performance (lower
>>>>>> overheads /
>>>>>> lighter weight), properly integrate the use of epoll and other
>>>>>> scalable fd monitoring mechanisms.
>>>>>>
>>>>>> WHERE: 98% of the changes are in opal/event; there's a few
>>>>>> changes to
>>>>>> configury and one change to the orted.
>>>>>>
>>>>>> TIMEOUT: COB, Friday, 21 March 2008
>>>>>>
>>>>>> DESCRIPTION:
>>>>>>
>>>>>> George/UTK has done the bulk of the work to integrate a new
>>>>>> version
>>>>>> of
>>>>>> libevent on the following tmp branch:
>>>>>>
>>>>>> https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge
>>>>>>
>>>>>> ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
>>>>>> BRANCH!
>>>>>> **
>>>>>>
>>>>>> Cisco ran MTT on this branch on Friday and everything checked out
>>>>>> (i.e., no more failures than on the trunk). We just made a few
>>>>>> more
>>>>>> minor changes today and I'm running MTT again now, but I'm not
>>>>>> expecting any new failures (MTT will take several hours). We
>>>>>> would
>>>>>> like to bring the new libevent in over this upcoming weekend, but
>>>>>> would very much appreciate if others could test on their
>>>>>> platforms
>>>>>> (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should*
>>>>>> be a
>>>>>> fairly side-effect free change, but it is possible that since
>>>>>> we're
>>>>>> now using epoll and other scalable fd monitoring tools, we'll run
>>>>>> into
>>>>>> some unanticipated issues on some platforms.
>>>>>>
>>>>>> Here's a consolidated diff if you want to see the changes:
>>>>>>
>>>>>> https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
>>>>>> 2Flibevent-merge&old=17846&new_path=trunk&new=17842
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> --
>>>>>> Jeff Squyres
>>>>>> Cisco Systems
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel