Josh ran some tests for me on Odin earlier today - the results show a
major improvement in our startup/shutdown performance. As you may
recall, our times grew roughly exponentially before - as the attached
graph shows, they now grow roughly linearly. The data also shows that
the MPI_INIT penalty is fairly small. This is due to the data
exchange being "encapsulated" in the initial data sent back at the
stage_1 trigger, thus avoiding any further overhead as the number of
processes grows. The data was taken using the rsh launcher.
We should be able to further improve our scalability once we (a)
incorporate a tree-based scheme into the rsh launcher and (b) utilize
a tree-based (or better) broadcast mechanism for sending the trigger
messages (right now, we send them linearly across the processes).
Anyway, thought you might find this of interest.