Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] opal_os_dirpath_create: Error: Unable to create the, sub-directory
From: Reuti (reuti_at_[hidden])
Date: 2014-02-03 17:08:01


Am 03.02.2014 um 23:01 schrieb Eric Chamberland:

> Hi Ralph,
>
> On 02/03/2014 04:20 PM, Ralph Castain wrote:
>> On Feb 3, 2014, at 1:13 PM, Eric Chamberland <Eric.Chamberland_at_[hidden]> wrote:
>>
>>> On 02/03/2014 03:59 PM, Ralph Castain wrote:
>>>> Very strange - even if you kill the job with SIGTERM, or have processes that segfault, OMPI should clean itself up and remove those session directories. Granted, the 1.6 series isn't as good about doing so as the 1.7 series, but it at least to-date has done pretty well.
>>> Ok, one more information here that may matter: All sequential tests are launched *without* mpiexec... I don't know if the "cleanup" phase is done by mpiexec or the binaries...
>> Ah, yes that would be a source of the problem! We can't guarantee cleanup if you just kill the procs or they segfault *unless* mpiexec is used to launch the job. What are you using to launch? Most resource managers provide an "epilog" capability for precisely this purpose as all MPIs would display the same issue.
> For the sequential jobs, we just launch the tests on the "command line"... no resource manager is ever used. For the jobs which requires more than 1 process, we have "mpiexec -n ..." added to the command line...
>
>>> which should delete files that shouldn't exists... ;-)
>>>
>>> But, IMHO, I still think OpenMPI should "choose" another directory name if it can't create it because a poor file exists!
>> We could do that - but now we get into the bottomless pit of trying every possible combination of directory names, and ensuring that every process comes up with the same answer! Remember, the session dir is where the shared memory regions rendezvous, so every process on a node would have to find the same place
> ok. Just for my knowledge: that means if I launch 2 processes on a single node and they have to communicate, they will do it by the files in /tmp?
>
>>> How can all users be aware that they have to cleanup such files?
>> Given how long 1.6.x has been out there, and that this is about the only time I've heard of a problem, I'm not sure this is a general enough issue to merit the concern
> Ok. I did just verified on 8 other computers/architectures that are running the same tests: there is only 1 which have files in the directory level of /tmp/openmpi-sessions-${USER}*
> Since we do that kind of testing since many years, I also agree it is not a widespread issue... But it just occured 2 times in the last 3 days!!! :-/

What about using a queuing system? Open MPI will put the created files into a subdirectory dedicated for this job by the queuing system. Even if Open MPI fails to remove the files, the queuing system will do.

-- Reuti

>>
>>> Maybe a good compromise would be to have the error message to tell there is a file with the same name of the directory chosen?
>> I can make that change - good suggestion.
> ok, thanks!
>
>>
>>> Or add a new entry to the FAQ to help users find the workaround you proposed... ;-)
>> we can try to do that too
>
> If I may suggest to test the behavior of 1.7.x... what about this: Have a test case that creates a bunch of files (from 0 to 65536) in /tmp/openmpi-sessions-${USER}... before launching an executable without mpirun... >:)
>
> Anyway, thanks a lot!
>
> Eric
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users