On Dec 15, 2013, at 15:40 , Ralph Castain <firstname.lastname@example.org> wrote:
Not true, George - look more closely at the code. We only retrieve the hostname if the number of procs is low. Otherwise, we do *not* retrieve it until we do a modex_recv, and thus the debug is now broken at scale. This was required for scalable launch, which is something I know is important to you as well.
Sure, if you trust the comment in the file. Unfortunately the comment is wrong, nobody is setting the hostnam of prods we’re talking about.
Moreover the real meaning of the cutoff parameters is clearly defined in the snippet below:
As per the email discussion, revise the sparse handling of hostnames so
that we avoid potential infinite loops while allowing large-scale users to
improve their startup time:
* add a new MCA param orte_hostname_cutoff to specify the number of nodes
at which we stop including hostnames. This defaults to INT_MAX => always
include hostnames. If a value is given, then we will include hostnames for
any allocation smaller than the given limit.
* remove ompi_proc_get_hostname. Replace all occurrences with a direct
link to ompi_proc_t’s proc_hostname, protected by appropriate "if NULL"
The comment above is about scalability.
Modifying the API isn't a big deal, so why the fuss? Let's just change it and get the debug working again.
Here is how I see the thing. I made a change to remove a deadlock and maintain the scalability of the codebase, a change that does not affect the normal use of the OMPI debug facility for most of the users. From here on, feel free to improve on the existing code as much as you feel necessary as long as you maintain the above properties. Enough has been said about this topic, I will now pursue my other interests.