Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] MPI_Comm_spawn lots of times
From: Abhishek Kulkarni (abbyzcool_at_[hidden])
Date: 2009-12-02 18:37:46

On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> Indeed - that is very helpful! Thanks!
> Looks like we aren't cleaning up high enough - missing the directory level.
> I seem to recall seeing that error go by and that someone fixed it on our
> devel trunk, so this is likely a repair that didn't get moved over to the
> release branch as it should have done.
> I'll look into it and report back.

You are probably referring to

There was an issue about orte_session_dir_finalize() not
cleaning up the session directories properly.

Hope that helps.


> Thanks again
> Ralph
> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc_at_[hidden]> wrote:
>> Hmm....if you are willing to keep trying, could you perhaps let it run for
>> a brief time, ctrl-z it, and then do an ls on a directory from a process
>> that has already terminated? The pids will be in order, so just look for an
>> early number (not mpirun or the parent, of course).
>> It would help if you could give us the contents of a directory from a
>> child process that has terminated - would tell us what subsystem is failing
>> to properly cleanup.
> Ok, so I Ctrl-Z the master. In
> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0 I now have only one
> directory
> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857
> I can't find that PID though. mpirun has PID 4230, orted does not exist,
> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again,
> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there
> are 70 sequentially numbered directories starting at 0. Every directory
> contains another directory called "0". There is nothing in any of those
> directories. I see for instance:
> /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh 70
> total 4.0K
> drwx------ 2 nbock users 4.0K Dec  2 14:41 0
> and
> nbock_at_mujo /tmp/.private/nbock/openmpi-sessions-nbock_at_mujo_0/857 $ ls -lh
> 70/0/
> total 0
> I hope this information helps. Did I understand your question correctly?
> nick
> _______________________________________________
> users mailing list
> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]