WHAT: Update libevent to 2.0.19 release
WHEN: As soon as it is released, expected around May 11
WHY: The 2.0.19 release contains a critical fix to a bug I recently discovered in the libevent 2.0.x series
I discovered a bug in libevent over the last few days that causes it to unexpectedly "invert" event priorities. It is a slightly subtle bug, but we were able to provide a simple reproducer and so the libevent folks were able to quickly implement a fix.
Stated simply, if you were in an event of a given priority and activated an event of higher priority, that new event would not get serviced if any event of the current priority were to become active prior to leaving the current event. In other words, libevent would service all active events of the current priority before even looking to see if a higher priority event was active.
The patch adds the following logic to event_active:
> IF <I am in an event> AND
> IF <ev->base> EQ <current-base> AND
> IF <pri> LT <current-pri> THEN
> <rescan queues on next loop>
Thus, a rescan only occurs if a higher priority event becomes active during an event of lower priority. Unfortunately, ORTE relies on this behavior to handle errors - without the change, an error reported in a message from a daemon (for example) cannot be serviced until ALL messages that arrive during the processing of the message have been handled. In the case of a large cluster that is receiving a long list of messages, this prevents the error from being handled for quite some time.