Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Eloi Gaudry (eg_at_[hidden])
Date: 2009-11-10 17:42:02


Hi Reuti,

I followed your advice and switched to a local "tmpdir" instead of a
share one. This solved the session directory issue, thanks for your help !
However, I cannot understand how the issue disappeared. Any input would
be welcome as I really like to understand how SGE/OpenMPI could failed
when using such a configuration (i.e. with a shared "tmpdir").

Eloi

On 10/11/2009 19:17, Eloi Gaudry wrote:
> Reuti,
>
> The acl here were just added when I tried to force the /opt/sge/tmp
> subdirectories to be 777 (which I did when I first encountered the
> error of subdirectories creation within OpenMPI). I don't think the
> info I'll provide will be meaningfull here:
>
> moe:~# getfacl /opt/sge/tmp
> getfacl: Removing leading '/' from absolute path names
> # file: opt/sge/tmp
> # owner: sgeadmin
> # group: fft
> user::rwx
> group::rwx
> mask::rwx
> other::rwx
> default:user::rwx
> default:group::rwx
> default:group:fft:rwx
> default:mask::rwx
> default:other::rwx
>
> I'll try to use a local directory instead of a shared one for
> "tmpdir". But as this issue seems somehow related to permissions, I
> don't know if this would eventually be the rigth solution.
>
> Thanks for your help,
> Eloi
>
> Reuti wrote:
>> Hi,
>>
>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>
>>> Reuti,
>>>
>>> I'm using "tmpdir" as a shared directory that contains the session
>>> directories created during job submission, not for computing or
>>> local storage. Doesn't the session directory (i.e.
>>> job_id.queue_name) need to be shared among all computing nodes (at
>>> least the ones that would be used with orted during the parallel
>>> computation) ?
>>
>> no. orted runs happily with local $TMPDIR on each and every node. The
>> $TMPDIRs are intended to be used by the user for any temporary data
>> for his job, as they are created and removed by SGE automatically for
>> every job for his convenience.
>>
>>
>>> All sequential job run fine, as no write operation is performed in
>>> "tmpdir/session_directory".
>>>
>>> All users are known on the computing nodes and the master node (with
>>> use ldap authentication on all nodes).
>>>
>>> As for the access checkings:
>>> moe:~# ls -alrtd /opt/sge/tmp
>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>
>> Aha, the + tells that there are some ACLs set:
>>
>> getfacl /opt/sge/tmp
>>
>>
>>> And for the parallel environment configuration:
>>> moe:~# qconf -sp round_robin
>>> pe_name round_robin
>>> slots 32
>>> user_lists NONE
>>> xuser_lists NONE
>>> start_proc_args /bin/true
>>> stop_proc_args /bin/true
>>> allocation_rule $round_robin
>>> control_slaves TRUE
>>> job_is_first_task FALSE
>>> urgency_slots min
>>> accounting_summary FALSE
>>
>> Okay, fine.
>>
>> -- Reuti
>>
>>
>>> Thanks for your help,
>>> Eloi
>>>
>>> Reuti wrote:
>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>
>>>>> Thanks for your help Reuti,
>>>>>
>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from the
>>>>> master node to all others computing nodes.
>>>>
>>>> It's higly advisable to have the "tmpdir" local on each node. When
>>>> you use "cd $TMPDIR" in your jobscript, all is done local on a node
>>>> (when your application will just create the scratch file in your
>>>> current working directory) which will speed up the computation and
>>>> decrease the network traffic. Computing in as shared /opt/sge/tmp
>>>> is like computing in each user's home directory.
>>>>
>>>> To avoid that any user can remove someone else's files, the "t"
>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096 2009-11-10
>>>> 18:35 /tmp/
>>>>
>>>> Nevertheless:
>>>>
>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>> /etc/fstab on
>>>>> client:
>>>>> moe.fft:/opt/sge
>>>>> /opt/sge nfs
>>>>> rw,bg,soft,timeo=14, 0 0
>>>>> Actually, the /opt/sge/tmp directory is 777 across all machines,
>>>>> thus all user should be able to create a directory inside.
>>>>
>>>> All access checkings will be applied:
>>>>
>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>> - the one from the export (this seems to be fine)
>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>
>>>>> The issue seems somehow related to the session directory created
>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for
>>>>> example for the job 29 on queue smp8.q. This subdirectory of
>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>> permissions... which in turn forbids
>>>>
>>>> Did you try to run some simple jobs before the parallel ones - are
>>>> these working? The daemons (qmaster and execd) were started as root?
>>>>
>>>> The user is known on the file server, i.e. the machine hosting
>>>> /opt/sge?
>>>>
>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>> nobody:nogroup credentials).
>>>>
>>>> In SGE the master process (the one running the job script) will
>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started qrsh
>>>> inside SGE - all with the same name. What is your definition of the
>>>> PE in SGE which you use?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Ad Ralph suggested, I checked the SGE configuration, but I haven't
>>>>> found anything related to nobody:nogroup configuration so far.
>>>>>
>>>>> Eloi
>>>>>
>>>>>
>>>>> Reuti wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>
>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>
>>>>>>> As for the error message received, there might be some
>>>>>>> inconsistency:
>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0" is the
>>>>>>
>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>> (sometimes implemented as /scratch in a partition on its own)
>>>>>> should be local on each node.
>>>>>>
>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>
>>>>>> If you want to share /opt/sge/tmp, everyone must be able to write
>>>>>> into this location. As for me it's working fine (with the local
>>>>>> /tmp), I assume the nobody/nogroup comes from any squash-setting
>>>>>> in the /etc/export of you master node.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> parent directory and
>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0/53199/0/0"
>>>>>>> is the subdirectory... not the other way around.
>>>>>>>
>>>>>>> Eloi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Ralph Castain wrote:
>>>>>>>> Creating a directory with such credentials sounds like a bug in
>>>>>>>> SGE to me...perhaps an SGE config issue?
>>>>>>>>
>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>> directory as the root for its session dir tree - check "mpirun
>>>>>>>> -h", or ompi_info for the required option.
>>>>>>>>
>>>>>>>> But I would first check your SGE config as that just doesn't
>>>>>>>> sound right.
>>>>>>>>
>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3
>>>>>>>>> (with gridengine compnent).
>>>>>>>>>
>>>>>>>>> During any job submission, SGE creates a session directory in
>>>>>>>>> $TMPDIR, named after the job id and the computing node name.
>>>>>>>>> This session directory is created using nobody/nogroup
>>>>>>>>> credentials.
>>>>>>>>>
>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>> different subdirectories in this session directory. The issue
>>>>>>>>> I'm facing now is that OpenMPI fails to create these
>>>>>>>>> subdirectories:
>>>>>>>>>
>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>> create the sub-directory
>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c
>>>>>>>>> at line 273
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> It looks like orte_init failed for some reason; your parallel
>>>>>>>>> process is
>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>> process can
>>>>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>>>>> environment problems. This failure appears to be an internal
>>>>>>>>> failure;
>>>>>>>>> here's some additional information (which may only be relevant
>>>>>>>>> to an
>>>>>>>>> Open MPI developer):
>>>>>>>>>
>>>>>>>>> orte_session_dir failed
>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> It looks like orte_init failed for some reason; your parallel
>>>>>>>>> process is
>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>> process can
>>>>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>>>>> environment problems. This failure appears to be an internal
>>>>>>>>> failure;
>>>>>>>>> here's some additional information (which may only be relevant
>>>>>>>>> to an
>>>>>>>>> Open MPI developer):
>>>>>>>>>
>>>>>>>>> orte_ess_set_name failed
>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at line
>>>>>>>>> 473
>>>>>>>>>
>>>>>>>>> This seems very likely related to the permissions set on $TMPDIR.
>>>>>>>>>
>>>>>>>>> I'd like to know if someone might have experienced the same or
>>>>>>>>> a similar issue and if any solution was found.
>>>>>>>>>
>>>>>>>>> Thanks for your help,
>>>>>>>>> Eloi
>>>>>>>>>
>>>>>>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>