Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] JDATA access problem.
From: Hugo Meyer (meyer.hugo_at_[hidden])
Date: 2011-03-22 07:58:33

Thanks again Ralph for your reply.

> There's your problem - that module is run in the daemon, where the
> orte_job_data pointer array isn't used. You have to use the
> orte_local_jobdata and orte_local_children lists instead. So once the HNP
> replies with the jobid, you look up the orte_odls_job_t for that job from
> the orte_local_jobdata list.

I'm sending now to you all the piece of code involved, at the beginning i'm
doing something about what you are saying. Then having the child info i ask
to the hnp for the jobdata of the child, but i'm still getting no data about
the child (that is the dead process). I'm trying to get this info to send
info to another orted to restart this failed process.

> I'm not sure what you are trying to accomplish, so I can't give further
> advice. Note that daemons have limited knowledge of application processes
> that are not their own immediate children. What little they know regarding
> processes other than their own is stored in the nidmap/pidmap arrays -
> limited to location, local rank, and node rank. They have no storage
> currently allocated for things like the state of a non-local process.

I want to restart the process in another node, that's why i'm needing the
jobdata. So, the hnp cannot do something like:
*jdata = orte_get_job_data_object(proc.jobid))*

when the proc doesn't belong to him??
So where i can obtain this information, because i'm asumming that i cannot
ask about the dead process to his daemon (because i assume that the daemon
also is dead, but that's not true). I was supossing that in the HNP i could
execute the sentence above.

I'm attaching all the code involving the described situation. But i have
made some changes after my first email, but what i'm trying to do is
basically the same. In the line 23 of the orted_comm.c, that i'm sending,
i'm always getting NULL as a result, so i can't obtain the jdata.

Thanks a lot again for your help.

Best Regards.

Hugo Meyer