Yo folks

Josh ran some tests for me on Odin earlier today - the results show a major improvement in our startup/shutdown performance. As you may recall, our times grew roughly exponentially before - as the attached graph shows, they now grow roughly linearly. The data also shows that the MPI_INIT penalty is fairly small. This is due to the data exchange being "encapsulated" in the initial data sent back at the stage_1 trigger, thus avoiding any further overhead as the number of processes grows. The data was taken using the rsh launcher.

We should be able to further improve our scalability once we (a) incorporate a tree-based scheme into the rsh launcher and (b) utilize a tree-based (or better) broadcast mechanism for sending the trigger messages (right now, we send them linearly across the processes).

Anyway, thought you might find this of interest.