A user noticed a specific change that we made between 1.4.2 and 1.4.3:
which is from CMR https://svn.open-mpi.org/trac/ompi/ticket/2489, and originally from trunk https://svn.open-mpi.org/trac/ompi/changeset/23434. I removed the opal_progress_event_users_decrement() from ompi_mpi_init() because the ORTE DPM does its own _increment() and _decrement().
However, it seems that there was an unintended consequence of this -- look at the annotated Ganglia graph that the user sent (see attached). In 1.4.2, all of the idle time was "user" CPU usage. In 1.4.3, it's split between user and system CPU usage. The application that he used to test is basically an init / finalize test (with some additional MPI middleware). See:
Can anyone think of why this occurs, and/or if it's a Bad Thing?
If removing this decrement enabled a bunch more system CPU time, that would seem to imply that we're calling libevent more frequently than we used to (vs. polling the opal event callbacks), and therefore that there might now be an unmatched increment somewhere.
For corporate legal information go to: