Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_Comm_spawn lots of times
From: Nicolas Bock (nicolasbock_at_[hidden])
Date: 2009-12-03 00:06:26


That was quick. I will try the patch as soon as you release it.

nick

On Wed, Dec 2, 2009 at 21:06, Ralph Castain <rhc_at_[hidden]> wrote:

> Patch is built and under review...
>
> Thanks again
> Ralph
>
> On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:
>
> Thanks
>
> On Wed, Dec 2, 2009 at 17:04, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> Yeah, that's the one all right! Definitely missing from 1.3.x.
>>
>> Thanks - I'll build a patch for the next bug-fix release
>>
>>
>> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
>>
>> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>> >> Indeed - that is very helpful! Thanks!
>> >> Looks like we aren't cleaning up high enough - missing the directory
>> level.
>> >> I seem to recall seeing that error go by and that someone fixed it on
>> our
>> >> devel trunk, so this is likely a repair that didn't get moved over to
>> the
>> >> release branch as it should have done.
>> >> I'll look into it and report back.
>> >
>> > You are probably referring to
>> > https://svn.open-mpi.org/trac/ompi/changeset/21498
>> >
>> > There was an issue about orte_session_dir_finalize() not
>> > cleaning up the session directories properly.
>> >
>> > Hope that helps.
>> >
>> > Abhishek
>> >
>> >> Thanks again
>> >> Ralph
>> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
>> >>
>> >>
>> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc_at_[hidden]> wrote:
>> >>>
>> >>> Hmm....if you are willing to keep trying, could you perhaps let it run
>> for
>> >>> a brief time, ctrl-z it, and then do an ls on a directory from a
>> process
>> >>> that has already terminated? The pids will be in order, so just look
>> for an
>> >>> early number (not mpirun or the parent, of course).
>> >>> It would help if you could give us the contents of a directory from a
>> >>> child process that has terminated - would tell us what subsystem is
>> failing
>> >>> to properly cleanup.
>> >>
>> >> Ok, so I Ctrl-Z the master. In
>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0 I now have only one
>> >> directory
>> >>
>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857
>> >>
>> >> I can't find that PID though. mpirun has PID 4230, orted does not
>> exist,
>> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it
>> again,
>> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68,
>> there
>> >> are 70 sequentially numbered directories starting at 0. Every directory
>> >> contains another directory called "0". There is nothing in any of those
>> >> directories. I see for instance:
>> >>
>> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh 70
>> >> total 4.0K
>> >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0
>> >>
>> >> and
>> >>
>> >> nbock_at_mujo /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls
>> -lh
>> >> 70/0/
>> >> total 0
>> >>
>> >> I hope this information helps. Did I understand your question
>> correctly?
>> >>
>> >> nick
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >> _______________________________________________
>> >> users mailing list
>> >> users_at_[hidden]
>> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >>
>> >
>> > _______________________________________________
>> > users mailing list
>> > users_at_[hidden]
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>