I'd like to have a look at the diff between the two, but I can't do so
until tomorrow at the earliest.
On Jul 9, 2008, at 7:26 PM, Ralph Castain wrote:
> I have been investigating Ticket #1135 - stdin is read twice if rank=0
> shares the node with mpirun. Repairing this problem is going to be
> difficult due to the rather terrible spaghetti code in the IOF, and
> the fact
> that the IOF in the HNP actually rml.sends the IO to itself multiple
> as it cycles through the spaghetti.
> Unfortunately, this problem -is- a regression from 1.2. Rather than
> weeks trying to fix it, I see two approaches we could pursue. First,
> I could
> repair the problem by essentially returning the IOF to its 1.2
> state. This
> will have to be done by hand as most of the differences are in
> calls to utilities that have changed due to the removal of the old NS
> framework. However, there are a few places where the logic itself
> has been
> modified - and the problem must stem from somewhere in there.
> If I make this change, then we will be no better, and no worse, than
> Note that we currently advise people to read from a file instead of
> stdin to avoid other issues that were present in 1.2.
> Alternatively, we could ship 1.3 as-is, and warn users (similar to
> 1.2) that
> they should avoiding reading from stdin if there is any chance that
> could be co-located with mpirun. Note that most of our clusters do
> not allow
> such co-location - but it is permitted by default by OMPI.
> We already plan to revisit the IOF at next week's technical meeting,
> with a
> goal of redefining the IOF's API to a more reduced set that reflects
> a less
> ambitious requirement. I expect to implement those changes fairly soon
> thereafter, but that would be targeted to 1.4 - not 1.3.
> Any thoughts on which way we should go?
> devel mailing list