Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2007-07-18 08:29:15


Just to further clarify the clarification... ;-)

This condition has existed for the last several months. The root problem
dates at least back into the 1.1 series. We chased the problem down to the
iof_flush call in the odls when a process terminates in something like Jan
or Feb this year, at which point we #if 0'd the iof_flush out of the code
pending a resolution (tickets were filed, emails flew, phone calls ensued -
just took awhile for people to have time to deal with it). It is still "on"
in 1.2 - just has been turned "off" in the trunk for months.

[Actually, I did turn it back on briefly following r15390. Turned out the
timing changed just enough to make it work most of the time with things that
called orte_finalize, but always fail for programs that didn't, so we turned
it back off again]

So the problem of having clipped output has been around for quite some time.
Since only Galen ever commented to me about being impacted by it, I gather
nobody has really noticed. ;-)

Hopefully, we'll be able to turn it back on again soon.

On 7/18/07 6:02 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:

> BTW, the fix didn't occur over the weekend because of some merging
> issues.
>
> I also didn't explain the problem well; you may see some clipped
> output from your program or the orted may hang while everything is
> shutting down. This is especially likely to occur for very short
> applications.
>
> The problem is actually in the oob; the orted gets into a state where
> it's waiting for some IOF OOB callbacks to occur for messages that
> were already successfully sent, but the callbacks never occur due
> to... well, it's a long story. The IOF is basically spinning during
> the orted shutdown waiting for pending OOB callbacks that will never
> occur.
>
> I can explain in more detail if anyone cares, but hopefully Brian
> will be able to work the fix in within the next few days.
>
>
> On Jul 13, 2007, at 5:04 PM, Jeff Squyres wrote:
>
>> FYI: there is an issue on the OMPI trunk right now that the tail
>> end of output from applications may get clipped. The fix is coming
>> this weekend. If you care, I'll explain, but I just wanted to give
>> everyone heads up that if you see the tail end of your stdout/
>> stderr not show up, it's probably not your fault. :-)
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>>
>