Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Ralph H. Castain (rhc_at_[hidden])
Date: 2006-02-13 16:16:42


Hmmmm....I wonder if this is going to create a problem?

Tim/Brian/you io forwarding folks: This poses an interesting
question. We automatically wire up i/o forwarding in our spawn
routine. What happens when someone sets up their own i/o forwarding
callback and subsequently wires up stdio themselves? Does this
overwrite what we did, do processes receive duplicate copies, does it
generate an error, ...?

I gather this is working for Nathan, and I don't claim to fully
understand what he is doing, but I'm curious as to what might happen
since I don't see anything in the system to prevent someone doing
this (not sure we could anyway).

Ralph

At 02:32 PM 2/9/2006, you wrote:
>I've coded a hacky workaround in our code to get past this. Basically,
>I capture all of the state transitions and the first one fired for a job
>I fire the 'init' state internally in our tool. Generally this occurs
>for one of the gate transitions, G1 or something. It'll work this way.
>
>Furthermore, we're telling our users to get your 1.0.2a4 (or whatever
>1.0.2 is available at the time).
>
>The way I coded it when you guys put this into the main branch and the
>INIT state resumes firing then my code will start working that much
>better. I really only brought it up because I felt it was a bug you
>might not have been aware of.
>
>Thanks all.
>
>-- Nathan
>Correspondence
>---------------------------------------------------------------------
>Nathan DeBardeleben, Ph.D.
>Los Alamos National Laboratory
>Parallel Tools Team
>High Performance Computing Environments
>phone: 505-667-3428
>email: ndebard_at_[hidden]
>---------------------------------------------------------------------
>
>
>
>Jeff Squyres wrote:
> > Nathan --
> >
> > Ralph and I talked about this and decided not to bring it over to the
> > 1.0 branch -- the fix uses new functionality that exists on the trunk
> > and not in the 1.0 branch. The fix could be re-crafted to use
> > existing functionality on the 1.0 branch (we're really trying to only
> > put bug fixes on the 1.0 branch -- not any new functionality) -- but
> > we didn't know if you cared. :-)
> >
> > Do you mind if this fix stays on the trunk, or do you need it in the
> > v1.0 branch?
> >
> >
> >
> > On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:
> >
> >
> >> Thanks Ralph.
> >>
> >> -- Nathan
> >> Correspondence
> >> ---------------------------------------------------------------------
> >> Nathan DeBardeleben, Ph.D.
> >> Los Alamos National Laboratory
> >> Parallel Tools Team
> >> High Performance Computing Environments
> >> phone: 505-667-3428
> >> email: ndebard_at_[hidden]
> >> ---------------------------------------------------------------------
> >>
> >>
> >>
> >> Ralph H. Castain wrote:
> >>
> >>> Nathan
> >>>
> >>> This should now be fixed on the trunk. Once it is checked out more
> >>> thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
> >>> might want to check out the trunk and verify it meets your needs.
> >>>
> >>> Ralph
> >>>
> >>> At 03:05 PM 2/1/2006, you wrote:
> >>>
> >>>
> >>>> This was happening on Alpha 1 as well but I upgraded today to
> >>>> Alpha 4 to
> >>>> see if it's gone away - it has not.
> >>>>
> >>>> I register a callback on a spawn() inside ORTE. That callback
> >>>> includes
> >>>> the current state and should be called as the job goes through
> >>>> those states.
> >>>>
> >>>> I am now noticing that jobs never go through the INIT state.
> >>>> They may
> >>>> also not go through others but definitely not ORTE_PROC_STATE_INIT.
> >>>>
> >>>> I was registering the IOForwarding callback during the INIT phase
> >>>> so,
> >>>> consequentially, I now do not have IOF. There are other side
> >>>> effects
> >>>> such as jobs that I start I think are perpetually in the 'starting'
> >>>> state and then, suddenly, they're done.
> >>>>
> >>>> Can someone look into / comment on this please?
> >>>>
> >>>> Thanks.
> >>>>
> >>>> --
> >>>> -- Nathan
> >>>> Correspondence
> >>>> --------------------------------------------------------------------
> >>>> -
> >>>> Nathan DeBardeleben, Ph.D.
> >>>> Los Alamos National Laboratory
> >>>> Parallel Tools Team
> >>>> High Performance Computing Environments
> >>>> phone: 505-667-3428
> >>>> email: ndebard_at_[hidden]
> >>>> --------------------------------------------------------------------
> >>>> -
> >>>>
> >>>> _______________________________________________
> >>>> devel mailing list
> >>>> devel_at_[hidden]
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>>
> >>>>
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> devel mailing list
> >> devel_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>
> >
> >
> >
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel