Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Comm_spawn lots of times
From: Nicolas Bock (nicolasbock_at_[hidden])
Date: 2009-12-18 13:16:50


Hi Ralph,

I have confirmed that openmpi-1.4a1r22335 works with my master, slave
example. The temporary directories are cleaned up properly.

Thanks for the help!

nick

On Thu, Dec 17, 2009 at 13:38, Nicolas Bock <nicolasbock_at_[hidden]> wrote:

> Ok, I'll give it a try.
>
> Thanks, nick
>
>
>
> On Thu, Dec 17, 2009 at 12:44, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> In case you missed it, this patch should be in the 1.4 nightly tarballs -
>> feel free to test and let me know what you find.
>>
>> Thanks
>> Ralph
>>
>> On Dec 2, 2009, at 10:06 PM, Nicolas Bock wrote:
>>
>> That was quick. I will try the patch as soon as you release it.
>>
>> nick
>>
>>
>> On Wed, Dec 2, 2009 at 21:06, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> Patch is built and under review...
>>>
>>> Thanks again
>>> Ralph
>>>
>>> On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:
>>>
>>> Thanks
>>>
>>> On Wed, Dec 2, 2009 at 17:04, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> Yeah, that's the one all right! Definitely missing from 1.3.x.
>>>>
>>>> Thanks - I'll build a patch for the next bug-fix release
>>>>
>>>>
>>>> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
>>>>
>>>> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <rhc_at_[hidden]>
>>>> wrote:
>>>> >> Indeed - that is very helpful! Thanks!
>>>> >> Looks like we aren't cleaning up high enough - missing the directory
>>>> level.
>>>> >> I seem to recall seeing that error go by and that someone fixed it on
>>>> our
>>>> >> devel trunk, so this is likely a repair that didn't get moved over to
>>>> the
>>>> >> release branch as it should have done.
>>>> >> I'll look into it and report back.
>>>> >
>>>> > You are probably referring to
>>>> > https://svn.open-mpi.org/trac/ompi/changeset/21498
>>>> >
>>>> > There was an issue about orte_session_dir_finalize() not
>>>> > cleaning up the session directories properly.
>>>> >
>>>> > Hope that helps.
>>>> >
>>>> > Abhishek
>>>> >
>>>> >> Thanks again
>>>> >> Ralph
>>>> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
>>>> >>
>>>> >>
>>>> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc_at_[hidden]>
>>>> wrote:
>>>> >>>
>>>> >>> Hmm....if you are willing to keep trying, could you perhaps let it
>>>> run for
>>>> >>> a brief time, ctrl-z it, and then do an ls on a directory from a
>>>> process
>>>> >>> that has already terminated? The pids will be in order, so just look
>>>> for an
>>>> >>> early number (not mpirun or the parent, of course).
>>>> >>> It would help if you could give us the contents of a directory from
>>>> a
>>>> >>> child process that has terminated - would tell us what subsystem is
>>>> failing
>>>> >>> to properly cleanup.
>>>> >>
>>>> >> Ok, so I Ctrl-Z the master. In
>>>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0 I now have only
>>>> one
>>>> >> directory
>>>> >>
>>>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857
>>>> >>
>>>> >> I can't find that PID though. mpirun has PID 4230, orted does not
>>>> exist,
>>>> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it
>>>> again,
>>>> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68,
>>>> there
>>>> >> are 70 sequentially numbered directories starting at 0. Every
>>>> directory
>>>> >> contains another directory called "0". There is nothing in any of
>>>> those
>>>> >> directories. I see for instance:
>>>> >>
>>>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh 70
>>>> >> total 4.0K
>>>> >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0
>>>> >>
>>>> >> and
>>>> >>
>>>> >> nbock_at_mujo /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $
>>>> ls -lh
>>>> >> 70/0/
>>>> >> total 0
>>>> >>
>>>> >> I hope this information helps. Did I understand your question
>>>> correctly?
>>>> >>
>>>> >> nick
>>>> >>
>>>> >> _______________________________________________
>>>> >> users mailing list
>>>> >> users_at_[hidden]
>>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> >>
>>>> >> _______________________________________________
>>>> >> users mailing list
>>>> >> users_at_[hidden]
>>>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> >>
>>>> >
>>>> > _______________________________________________
>>>> > users mailing list
>>>> > users_at_[hidden]
>>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>