Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 12:06:20


Hi,

Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:

> Thanks for your help Ralph, I'll double check that.
>
> As for the error message received, there might be some
> inconsistency: "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
> eg_at_charlie_0" is the

often /opt/sge is shared across the nodes, while the /tmp (sometimes
implemented as /scratch in a partition on its own) should be local on
each node.

What is the setting of "tmpdir" in your queue definition?

If you want to share /opt/sge/tmp, everyone must be able to write
into this location. As for me it's working fine (with the local /
tmp), I assume the nobody/nogroup comes from any squash-setting in
the /etc/export of you master node.

-- Reuti

> parent directory and "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
> eg_at_charlie_0/53199/0/0" is the subdirectory... not the other way
> around.
>
> Eloi
>
>
>
> Ralph Castain wrote:
>> Creating a directory with such credentials sounds like a bug in
>> SGE to me...perhaps an SGE config issue?
>>
>> Only thing you could do is tell OMPI to use some other directory
>> as the root for its session dir tree - check "mpirun -h", or
>> ompi_info for the required option.
>>
>> But I would first check your SGE config as that just doesn't sound
>> right.
>>
>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>
>>> Hi there,
>>>
>>> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3
>>> (with gridengine compnent).
>>>
>>> During any job submission, SGE creates a session directory in
>>> $TMPDIR, named after the job id and the computing node name. This
>>> session directory is created using nobody/nogroup credentials.
>>>
>>> When using OpenMPI with tight-integration, opal creates different
>>> subdirectories in this session directory. The issue I'm facing
>>> now is that OpenMPI fails to create these subdirectories:
>>>
>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to create
>>> the sub-directory (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>> eg_at_charlie_0) of (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>> eg_at_charlie_0
>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
>>> openmpi-1.3.3/orte/util/session_dir.c at line 101
>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
>>> openmpi-1.3.3/orte/util/session_dir.c at line 425
>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>> file ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>> ess_hnp_module.c at line 273
>>> --------------------------------------------------------------------
>>> ------
>>> It looks like orte_init failed for some reason; your parallel
>>> process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal
>>> failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> orte_session_dir failed
>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------
>>> ------
>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
>>> openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>> --------------------------------------------------------------------
>>> ------
>>> It looks like orte_init failed for some reason; your parallel
>>> process is
>>> likely to abort. There are many reasons that a parallel process can
>>> fail during orte_init; some of which are due to configuration or
>>> environment problems. This failure appears to be an internal
>>> failure;
>>> here's some additional information (which may only be relevant to an
>>> Open MPI developer):
>>>
>>> orte_ess_set_name failed
>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>> --------------------------------------------------------------------
>>> ------
>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>> file ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at
>>> line 473
>>>
>>> This seems very likely related to the permissions set on $TMPDIR.
>>>
>>> I'd like to know if someone might have experienced the same or a
>>> similar issue and if any solution was found.
>>>
>>> Thanks for your help,
>>> Eloi
>>>
>>>
>>>
>>>
>>> --
>>>
>>>
>>> Eloi Gaudry
>>>
>>> Free Field Technologies
>>> Axis Park Louvain-la-Neuve
>>> Rue Emile Francqui, 1
>>> B-1435 Mont-Saint Guibert
>>> BELGIUM
>>>
>>> Company Phone: +32 10 487 959
>>> Company Fax: +32 10 454 626
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
>
> Company Phone: +32 10 487 959
> Company Fax: +32 10 454 626
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users