Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Eloi Gaudry (eg_at_[hidden])
Date: 2009-11-10 18:19:57


On any execution node, creating a subdirectory of /opt/sge/tmp (i.e.
creating a session directory inside $TMPDIR) results in a new directory
own by the user/group that submitted the job (not nobody/nogroup).
If I switch back to a shared /opt/sge/tmp directory, all session
directories created by sge got nobody/nogroup as owner.

Eloi

On 11/11/2009 00:14, Reuti wrote:
> Am 11.11.2009 um 00:03 schrieb Eloi Gaudry:
>
>> The user/group used to generate the temporary directories was
>> nobody/nogroup, when using a shared $tmpdir.
>> Now that I'm using a local $tmpdir (one for each node, not
>> distributed over nfs), the right credentials (i.e. my
>> username/groupname) are used to create the session directory inside
>> $tmpdir, which in turn allows OpenMPI to successfully create its
>> session subdirectories.
>
> Aha, this explains why it's working now - so it's not an SGE issue IMHO.
>
> Question: when a user on the execution node goes to /opt/sge/tmp and
> creates a directory on the command line with mkdir: what group/user is
> used then?
>
> -- Reuti
>
>
>> Eloi
>>
>>
>> On 10/11/2009 23:51, Reuti wrote:
>>> Hi Eloi,
>>>
>>> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>>>
>>>> I followed your advice and switched to a local "tmpdir" instead of a
>>>> share one. This solved the session directory issue, thanks for your
>>>> help !
>>>
>>> what user/group is no listed for the generated temporary directories
>>> (i.e. $TMPDIR)?
>>>
>>> -- Reuti
>>>
>>>> However, I cannot understand how the issue disappeared. Any input
>>>> would be welcome as I really like to understand how SGE/OpenMPI could
>>>> failed when using such a configuration (i.e. with a shared "tmpdir").
>>>>
>>>> Eloi
>>>>
>>>>
>>>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>>>> Reuti,
>>>>>
>>>>> The acl here were just added when I tried to force the /opt/sge/tmp
>>>>> subdirectories to be 777 (which I did when I first encountered the
>>>>> error of subdirectories creation within OpenMPI). I don't think the
>>>>> info I'll provide will be meaningfull here:
>>>>>
>>>>> moe:~# getfacl /opt/sge/tmp
>>>>> getfacl: Removing leading '/' from absolute path names
>>>>> # file: opt/sge/tmp
>>>>> # owner: sgeadmin
>>>>> # group: fft
>>>>> user::rwx
>>>>> group::rwx
>>>>> mask::rwx
>>>>> other::rwx
>>>>> default:user::rwx
>>>>> default:group::rwx
>>>>> default:group:fft:rwx
>>>>> default:mask::rwx
>>>>> default:other::rwx
>>>>>
>>>>> I'll try to use a local directory instead of a shared one for
>>>>> "tmpdir". But as this issue seems somehow related to permissions, I
>>>>> don't know if this would eventually be the rigth solution.
>>>>>
>>>>> Thanks for your help,
>>>>> Eloi
>>>>>
>>>>> Reuti wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>>>
>>>>>>> Reuti,
>>>>>>>
>>>>>>> I'm using "tmpdir" as a shared directory that contains the session
>>>>>>> directories created during job submission, not for computing or
>>>>>>> local storage. Doesn't the session directory (i.e.
>>>>>>> job_id.queue_name) need to be shared among all computing nodes (at
>>>>>>> least the ones that would be used with orted during the parallel
>>>>>>> computation) ?
>>>>>>
>>>>>> no. orted runs happily with local $TMPDIR on each and every node.
>>>>>> The $TMPDIRs are intended to be used by the user for any temporary
>>>>>> data for his job, as they are created and removed by SGE
>>>>>> automatically for every job for his convenience.
>>>>>>
>>>>>>
>>>>>>> All sequential job run fine, as no write operation is performed in
>>>>>>> "tmpdir/session_directory".
>>>>>>>
>>>>>>> All users are known on the computing nodes and the master node
>>>>>>> (with use ldap authentication on all nodes).
>>>>>>>
>>>>>>> As for the access checkings:
>>>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>>>
>>>>>> Aha, the + tells that there are some ACLs set:
>>>>>>
>>>>>> getfacl /opt/sge/tmp
>>>>>>
>>>>>>
>>>>>>> And for the parallel environment configuration:
>>>>>>> moe:~# qconf -sp round_robin
>>>>>>> pe_name round_robin
>>>>>>> slots 32
>>>>>>> user_lists NONE
>>>>>>> xuser_lists NONE
>>>>>>> start_proc_args /bin/true
>>>>>>> stop_proc_args /bin/true
>>>>>>> allocation_rule $round_robin
>>>>>>> control_slaves TRUE
>>>>>>> job_is_first_task FALSE
>>>>>>> urgency_slots min
>>>>>>> accounting_summary FALSE
>>>>>>
>>>>>> Okay, fine.
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> Thanks for your help,
>>>>>>> Eloi
>>>>>>>
>>>>>>> Reuti wrote:
>>>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>>>
>>>>>>>>> Thanks for your help Reuti,
>>>>>>>>>
>>>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from
>>>>>>>>> the master node to all others computing nodes.
>>>>>>>>
>>>>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done local on
>>>>>>>> a node (when your application will just create the scratch file
>>>>>>>> in your current working directory) which will speed up the
>>>>>>>> computation and decrease the network traffic. Computing in as
>>>>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>>>>> directory.
>>>>>>>>
>>>>>>>> To avoid that any user can remove someone else's files, the "t"
>>>>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>>>
>>>>>>>> Nevertheless:
>>>>>>>>
>>>>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>>>> /etc/fstab on
>>>>>>>>> client:
>>>>>>>>> moe.fft:/opt/sge
>>>>>>>>> /opt/sge nfs
>>>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all machines,
>>>>>>>>> thus all user should be able to create a directory inside.
>>>>>>>>
>>>>>>>> All access checkings will be applied:
>>>>>>>>
>>>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>>>> - the one from the export (this seems to be fine)
>>>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>>>>
>>>>>>>>> The issue seems somehow related to the session directory created
>>>>>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for
>>>>>>>>> example for the job 29 on queue smp8.q. This subdirectory of
>>>>>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>>>>>> permissions... which in turn forbids
>>>>>>>>
>>>>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>>>>> are these working? The daemons (qmaster and execd) were started
>>>>>>>> as root?
>>>>>>>>
>>>>>>>> The user is known on the file server, i.e. the machine hosting
>>>>>>>> /opt/sge?
>>>>>>>>
>>>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>>>> nobody:nogroup credentials).
>>>>>>>>
>>>>>>>> In SGE the master process (the one running the job script) will
>>>>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started qrsh
>>>>>>>> inside SGE - all with the same name. What is your definition of
>>>>>>>> the PE in SGE which you use?
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>>>> haven't found anything related to nobody:nogroup configuration
>>>>>>>>> so far.
>>>>>>>>>
>>>>>>>>> Eloi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Reuti wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>>>
>>>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>>>
>>>>>>>>>>> As for the error message received, there might be some
>>>>>>>>>>> inconsistency:
>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0" is the
>>>>>>>>>>
>>>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>>>> (sometimes implemented as /scratch in a partition on its own)
>>>>>>>>>> should be local on each node.
>>>>>>>>>>
>>>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>>>
>>>>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>>>>> write into this location. As for me it's working fine (with the
>>>>>>>>>> local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>>>
>>>>>>>>>> -- Reuti
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> parent directory and
>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0/53199/0/0"
>>>>>>>>>>>
>>>>>>>>>>> is the subdirectory... not the other way around.
>>>>>>>>>>>
>>>>>>>>>>> Eloi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>>> Creating a directory with such credentials sounds like a bug
>>>>>>>>>>>> in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>>>
>>>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>>>
>>>>>>>>>>>> But I would first check your SGE config as that just doesn't
>>>>>>>>>>>> sound right.
>>>>>>>>>>>>
>>>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3
>>>>>>>>>>>>> (with gridengine compnent).
>>>>>>>>>>>>>
>>>>>>>>>>>>> During any job submission, SGE creates a session directory
>>>>>>>>>>>>> in $TMPDIR, named after the job id and the computing node
>>>>>>>>>>>>> name. This session directory is created using nobody/nogroup
>>>>>>>>>>>>> credentials.
>>>>>>>>>>>>>
>>>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create these
>>>>>>>>>>>>> subdirectories:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>>>>> create the sub-directory
>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c
>>>>>>>>>>>>>
>>>>>>>>>>>>> at line 273
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>> process can
>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>
>>>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>> process can
>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>
>>>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at
>>>>>>>>>>>>> line 473
>>>>>>>>>>>>>
>>>>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'd like to know if someone might have experienced the same
>>>>>>>>>>>>> or a similar issue and if any solution was found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>
>>>
>>
>