Patch is built and under review...

Thanks again
Ralph

On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote:

Thanks

On Wed, Dec 2, 2009 at 17:04, Ralph Castain <rhc@open-mpi.org> wrote:
Yeah, that's the one all right! Definitely missing from 1.3.x.

Thanks - I'll build a patch for the next bug-fix release


On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote:

> On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain <rhc@open-mpi.org> wrote:
>> Indeed - that is very helpful! Thanks!
>> Looks like we aren't cleaning up high enough - missing the directory level.
>> I seem to recall seeing that error go by and that someone fixed it on our
>> devel trunk, so this is likely a repair that didn't get moved over to the
>> release branch as it should have done.
>> I'll look into it and report back.
>
> You are probably referring to
> https://svn.open-mpi.org/trac/ompi/changeset/21498
>
> There was an issue about orte_session_dir_finalize() not
> cleaning up the session directories properly.
>
> Hope that helps.
>
> Abhishek
>
>> Thanks again
>> Ralph
>> On Dec 2, 2009, at 2:45 PM, Nicolas Bock wrote:
>>
>>
>> On Wed, Dec 2, 2009 at 14:23, Ralph Castain <rhc@open-mpi.org> wrote:
>>>
>>> Hmm....if you are willing to keep trying, could you perhaps let it run for
>>> a brief time, ctrl-z it, and then do an ls on a directory from a process
>>> that has already terminated? The pids will be in order, so just look for an
>>> early number (not mpirun or the parent, of course).
>>> It would help if you could give us the contents of a directory from a
>>> child process that has terminated - would tell us what subsystem is failing
>>> to properly cleanup.
>>
>> Ok, so I Ctrl-Z the master. In
>> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0 I now have only one
>> directory
>>
>> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857
>>
>> I can't find that PID though. mpirun has PID 4230, orted does not exist,
>> master is 4231, and slave is 4275. When I "fg" master and Ctrl-Z it again,
>> slave has a different PID as expected. I Ctrl-Z'ed in iteration 68, there
>> are 70 sequentially numbered directories starting at 0. Every directory
>> contains another directory called "0". There is nothing in any of those
>> directories. I see for instance:
>>
>> /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh 70
>> total 4.0K
>> drwx------ 2 nbock users 4.0K Dec  2 14:41 0
>>
>> and
>>
>> nbock@mujo /tmp/.private/nbock/openmpi-sessions-nbock@mujo_0/857 $ ls -lh
>> 70/0/
>> total 0
>>
>> I hope this information helps. Did I understand your question correctly?
>>
>> nick
>>
>> _______________________________________________
>> users mailing list
>> users@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users