Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Comm_spawn lots of times
From: Nicolas Bock (nicolasbock_at_[hidden])
Date: 2009-12-02 19:37:32


Thanks

On Wed, Dec 2, 2009 at 17:04, Ralph Castain <rhc_at_[hidden]> wrote:

> Yeah, that's the one all right! Definitely missing from 1.3.x.
>
> Thanks - I'll build a patch for the next bug-fix release
>
>
> On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:
>
> > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> >> Indeed - that is very helpful! Thanks!
> >> Looks like we aren't cleaning up high enough - missing the directory
> level.
> >> I seem to recall seeing that error go by and that someone fixed it on
> our
> >> devel trunk, so this is likely a repair that didn't get moved over to
> the
> >> release branch as it should have done.
> >> I'll look into it and report back.
> >
> > You are probably referring to
> > https://svn.open-mpi.org/trac/ompi/changeset/21498
> >
> > There was an issue about orte_session_dir_finalize() not
> > cleaning up the session directories properly.
> >
> > Hope that helps.
> >
> > Abhishek
> >
> >> Thanks again
> >> Ralph
> >> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
> >>
> >>
> >> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc_at_[hidden]> wrote:
> >>>
> >>> Hmm....if you are willing to keep trying, could you perhaps let it run
> for
> >>> a brief time, ctrl-z it, and then do an ls on a directory from a
> process
> >>> that has already terminated? The pids will be in order, so just look
> for an
> >>> early number (not mpirun or the parent, of course).
> >>> It would help if you could give us the contents of a directory from a
> >>> child process that has terminated - would tell us what subsystem is
> failing
> >>> to properly cleanup.
> >>
> >> Ok, so I Ctrl-Z the master. In
> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0 I now have only one
> >> directory
> >>
> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857
> >>
> >> I can't find that PID though. mpirun has PID 4230, orted does not exist,
> >> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it
> again,
> >> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68,
> there
> >> are 70 sequentially numbered directories starting at 0. Every directory
> >> contains another directory called "0". There is nothing in any of those
> >> directories. I see for instance:
> >>
> >> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh 70
> >> total 4.0K
> >> drwx------ 2 nbock users 4.0K Dec 2 14:41 0
> >>
> >> and
> >>
> >> nbock_at_mujo /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls
> -lh
> >> 70/0/
> >> total 0
> >>
> >> I hope this information helps. Did I understand your question correctly?
> >>
> >> nick
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>