Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Eloi Gaudry (eg_at_[hidden])
Date: 2009-11-10 11:55:21


Thanks for your help Ralph, I'll double check that.

As for the error message received, there might be some inconsistency:
"/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0" is the parent
directory and
"/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0/53199/0/0" is
the subdirectory... not the other way around.

Eloi

                                                                                            

Ralph Castain wrote:
> Creating a directory with such credentials sounds like a bug in SGE to
> me...perhaps an SGE config issue?
>
> Only thing you could do is tell OMPI to use some other directory as
> the root for its session dir tree - check "mpirun -h", or ompi_info
> for the required option.
>
> But I would first check your SGE config as that just doesn't sound right.
>
> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>
>> Hi there,
>>
>> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3 (with
>> gridengine compnent).
>>
>> During any job submission, SGE creates a session directory in
>> $TMPDIR, named after the job id and the computing node name. This
>> session directory is created using nobody/nogroup credentials.
>>
>> When using OpenMPI with tight-integration, opal creates different
>> subdirectories in this session directory. The issue I'm facing now is
>> that OpenMPI fails to create these subdirectories:
>>
>> [charlie:03882] opal_os_dirpath_create: Error: Unable to create the
>> sub-directory
>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c at
>> line 273
>> --------------------------------------------------------------------------
>>
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems. This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>> orte_session_dir failed
>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>>
>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>> --------------------------------------------------------------------------
>>
>> It looks like orte_init failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during orte_init; some of which are due to configuration or
>> environment problems. This failure appears to be an internal failure;
>> here's some additional information (which may only be relevant to an
>> Open MPI developer):
>>
>> orte_ess_set_name failed
>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>> --------------------------------------------------------------------------
>>
>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at line 473
>>
>> This seems very likely related to the permissions set on $TMPDIR.
>>
>> I'd like to know if someone might have experienced the same or a
>> similar issue and if any solution was found.
>>
>> Thanks for your help,
>> Eloi
>>
>>
>>
>>
>> --
>>
>>
>> Eloi Gaudry
>>
>> Free Field Technologies
>> Axis Park Louvain-la-Neuve
>> Rue Emile Francqui, 1
>> B-1435 Mont-Saint Guibert
>> BELGIUM
>>
>> Company Phone: +32 10 487 959
>> Company Fax: +32 10 454 626
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Eloi Gaudry
Free Field Technologies
Axis Park Louvain-la-Neuve
Rue Emile Francqui, 1
B-1435 Mont-Saint Guibert
BELGIUM
Company Phone: +32 10 487 959
Company Fax:   +32 10 454 626