Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Nathan DeBardeleben (ndebard_at_[hidden])
Date: 2006-02-09 16:32:01


I've coded a hacky workaround in our code to get past this. Basically,
I capture all of the state transitions and the first one fired for a job
I fire the 'init' state internally in our tool. Generally this occurs
for one of the gate transitions, G1 or something. It'll work this way.

Furthermore, we're telling our users to get your 1.0.2a4 (or whatever
1.0.2 is available at the time).

The way I coded it when you guys put this into the main branch and the
INIT state resumes firing then my code will start working that much
better. I really only brought it up because I felt it was a bug you
might not have been aware of.

Thanks all.

-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard_at_[hidden]
---------------------------------------------------------------------

Jeff Squyres wrote:
> Nathan --
>
> Ralph and I talked about this and decided not to bring it over to the
> 1.0 branch -- the fix uses new functionality that exists on the trunk
> and not in the 1.0 branch. The fix could be re-crafted to use
> existing functionality on the 1.0 branch (we're really trying to only
> put bug fixes on the 1.0 branch -- not any new functionality) -- but
> we didn't know if you cared. :-)
>
> Do you mind if this fix stays on the trunk, or do you need it in the
> v1.0 branch?
>
>
>
> On Feb 8, 2006, at 4:36 PM, Nathan DeBardeleben wrote:
>
>
>> Thanks Ralph.
>>
>> -- Nathan
>> Correspondence
>> ---------------------------------------------------------------------
>> Nathan DeBardeleben, Ph.D.
>> Los Alamos National Laboratory
>> Parallel Tools Team
>> High Performance Computing Environments
>> phone: 505-667-3428
>> email: ndebard_at_[hidden]
>> ---------------------------------------------------------------------
>>
>>
>>
>> Ralph H. Castain wrote:
>>
>>> Nathan
>>>
>>> This should now be fixed on the trunk. Once it is checked out more
>>> thoroughly, I'll ask that it be moved to the 1.0 branch. For now, you
>>> might want to check out the trunk and verify it meets your needs.
>>>
>>> Ralph
>>>
>>> At 03:05 PM 2/1/2006, you wrote:
>>>
>>>
>>>> This was happening on Alpha 1 as well but I upgraded today to
>>>> Alpha 4 to
>>>> see if it's gone away - it has not.
>>>>
>>>> I register a callback on a spawn() inside ORTE. That callback
>>>> includes
>>>> the current state and should be called as the job goes through
>>>> those states.
>>>>
>>>> I am now noticing that jobs never go through the INIT state.
>>>> They may
>>>> also not go through others but definitely not ORTE_PROC_STATE_INIT.
>>>>
>>>> I was registering the IOForwarding callback during the INIT phase
>>>> so,
>>>> consequentially, I now do not have IOF. There are other side
>>>> effects
>>>> such as jobs that I start I think are perpetually in the 'starting'
>>>> state and then, suddenly, they're done.
>>>>
>>>> Can someone look into / comment on this please?
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> -- Nathan
>>>> Correspondence
>>>> --------------------------------------------------------------------
>>>> -
>>>> Nathan DeBardeleben, Ph.D.
>>>> Los Alamos National Laboratory
>>>> Parallel Tools Team
>>>> High Performance Computing Environments
>>>> phone: 505-667-3428
>>>> email: ndebard_at_[hidden]
>>>> --------------------------------------------------------------------
>>>> -
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>