Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 18:44:29


Am 11.11.2009 um 00:29 schrieb Eloi Gaudry:

> This is what I did (create by hand /opt/sge/tmp/test on an
> execution host log as a regular cluster user).

Then we end up where I started to think first, but I missed the
implied default: can you export /opt/sge with "no_root_squash" and
reload the nfsserver. SGE will create the $TMPDIR as root/root and
then changes the uid/gid - both fails as it's squashed to nobody/
nogroup.

But having $TMPDIR local is still an advantage. Even SGE's spool
directrories can be local: http://gridengine.sunsource.net/howto/
nfsreduce.html

-- Reuti

> Eloi
>
> On 11/11/2009 00:26, Reuti wrote:
>> To avoid misunderstandings:
>>
>> Am 11.11.2009 um 00:19 schrieb Eloi Gaudry:
>>
>>> On any execution node, creating a subdirectory of /opt/sge/tmp
>>> (i.e. creating a session directory inside $TMPDIR) results in a
>>> new directory own by the user/group that submitted the job (not
>>> nobody/nogroup).
>>
>> $TMPDIR is in this case /opt/sge/tmp/<joib_id>.<task_id>.<qname>
>>
>> I really meant to create a directory in /opt/sge/tmp by hand with
>> mkdir, but on the execution node which mounts /opt/sge.
>>
>> -- Reuti
>>
>>
>>> If I switch back to a shared /opt/sge/tmp directory, all session
>>> directories created by sge got nobody/nogroup as owner.
>>>
>>> Eloi
>>>
>>> On 11/11/2009 00:14, Reuti wrote:
>>>> Am 11.11.2009 um 00:03 schrieb Eloi Gaudry:
>>>>
>>>>> The user/group used to generate the temporary directories was
>>>>> nobody/nogroup, when using a shared $tmpdir.
>>>>> Now that I'm using a local $tmpdir (one for each node, not
>>>>> distributed over nfs), the right credentials (i.e. my username/
>>>>> groupname) are used to create the session directory inside
>>>>> $tmpdir, which in turn allows OpenMPI to successfully create
>>>>> its session subdirectories.
>>>>
>>>> Aha, this explains why it's working now - so it's not an SGE
>>>> issue IMHO.
>>>>
>>>> Question: when a user on the execution node goes to /opt/sge/tmp
>>>> and creates a directory on the command line with mkdir: what
>>>> group/user is used then?
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Eloi
>>>>>
>>>>>
>>>>> On 10/11/2009 23:51, Reuti wrote:
>>>>>> Hi Eloi,
>>>>>>
>>>>>> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>>>>>>
>>>>>>> I followed your advice and switched to a local "tmpdir"
>>>>>>> instead of a
>>>>>>> share one. This solved the session directory issue, thanks
>>>>>>> for your
>>>>>>> help !
>>>>>>
>>>>>> what user/group is no listed for the generated temporary
>>>>>> directories
>>>>>> (i.e. $TMPDIR)?
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>> However, I cannot understand how the issue disappeared. Any
>>>>>>> input
>>>>>>> would be welcome as I really like to understand how SGE/
>>>>>>> OpenMPI could
>>>>>>> failed when using such a configuration (i.e. with a shared
>>>>>>> "tmpdir").
>>>>>>>
>>>>>>> Eloi
>>>>>>>
>>>>>>>
>>>>>>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>>>>>>> Reuti,
>>>>>>>>
>>>>>>>> The acl here were just added when I tried to force the /opt/
>>>>>>>> sge/tmp
>>>>>>>> subdirectories to be 777 (which I did when I first
>>>>>>>> encountered the
>>>>>>>> error of subdirectories creation within OpenMPI). I don't
>>>>>>>> think the
>>>>>>>> info I'll provide will be meaningfull here:
>>>>>>>>
>>>>>>>> moe:~# getfacl /opt/sge/tmp
>>>>>>>> getfacl: Removing leading '/' from absolute path names
>>>>>>>> # file: opt/sge/tmp
>>>>>>>> # owner: sgeadmin
>>>>>>>> # group: fft
>>>>>>>> user::rwx
>>>>>>>> group::rwx
>>>>>>>> mask::rwx
>>>>>>>> other::rwx
>>>>>>>> default:user::rwx
>>>>>>>> default:group::rwx
>>>>>>>> default:group:fft:rwx
>>>>>>>> default:mask::rwx
>>>>>>>> default:other::rwx
>>>>>>>>
>>>>>>>> I'll try to use a local directory instead of a shared one for
>>>>>>>> "tmpdir". But as this issue seems somehow related to
>>>>>>>> permissions, I
>>>>>>>> don't know if this would eventually be the rigth solution.
>>>>>>>>
>>>>>>>> Thanks for your help,
>>>>>>>> Eloi
>>>>>>>>
>>>>>>>> Reuti wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>>>>>>
>>>>>>>>>> Reuti,
>>>>>>>>>>
>>>>>>>>>> I'm using "tmpdir" as a shared directory that contains the
>>>>>>>>>> session
>>>>>>>>>> directories created during job submission, not for
>>>>>>>>>> computing or
>>>>>>>>>> local storage. Doesn't the session directory (i.e.
>>>>>>>>>> job_id.queue_name) need to be shared among all computing
>>>>>>>>>> nodes (at
>>>>>>>>>> least the ones that would be used with orted during the
>>>>>>>>>> parallel
>>>>>>>>>> computation) ?
>>>>>>>>>
>>>>>>>>> no. orted runs happily with local $TMPDIR on each and every
>>>>>>>>> node.
>>>>>>>>> The $TMPDIRs are intended to be used by the user for any
>>>>>>>>> temporary
>>>>>>>>> data for his job, as they are created and removed by SGE
>>>>>>>>> automatically for every job for his convenience.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> All sequential job run fine, as no write operation is
>>>>>>>>>> performed in
>>>>>>>>>> "tmpdir/session_directory".
>>>>>>>>>>
>>>>>>>>>> All users are known on the computing nodes and the master
>>>>>>>>>> node
>>>>>>>>>> (with use ldap authentication on all nodes).
>>>>>>>>>>
>>>>>>>>>> As for the access checkings:
>>>>>>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>>>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>>>>>>
>>>>>>>>> Aha, the + tells that there are some ACLs set:
>>>>>>>>>
>>>>>>>>> getfacl /opt/sge/tmp
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> And for the parallel environment configuration:
>>>>>>>>>> moe:~# qconf -sp round_robin
>>>>>>>>>> pe_name round_robin
>>>>>>>>>> slots 32
>>>>>>>>>> user_lists NONE
>>>>>>>>>> xuser_lists NONE
>>>>>>>>>> start_proc_args /bin/true
>>>>>>>>>> stop_proc_args /bin/true
>>>>>>>>>> allocation_rule $round_robin
>>>>>>>>>> control_slaves TRUE
>>>>>>>>>> job_is_first_task FALSE
>>>>>>>>>> urgency_slots min
>>>>>>>>>> accounting_summary FALSE
>>>>>>>>>
>>>>>>>>> Okay, fine.
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks for your help,
>>>>>>>>>> Eloi
>>>>>>>>>>
>>>>>>>>>> Reuti wrote:
>>>>>>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help Reuti,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp),
>>>>>>>>>>>> exported from
>>>>>>>>>>>> the master node to all others computing nodes.
>>>>>>>>>>>
>>>>>>>>>>> It's higly advisable to have the "tmpdir" local on each
>>>>>>>>>>> node.
>>>>>>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done
>>>>>>>>>>> local on
>>>>>>>>>>> a node (when your application will just create the
>>>>>>>>>>> scratch file
>>>>>>>>>>> in your current working directory) which will speed up the
>>>>>>>>>>> computation and decrease the network traffic. Computing
>>>>>>>>>>> in as
>>>>>>>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>>>>>>>> directory.
>>>>>>>>>>>
>>>>>>>>>>> To avoid that any user can remove someone else's files,
>>>>>>>>>>> the "t"
>>>>>>>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>>>>>>
>>>>>>>>>>> Nevertheless:
>>>>>>>>>>>
>>>>>>>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>>>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>>>>>>> /etc/fstab on
>>>>>>>>>>>> client:
>>>>>>>>>>>> moe.fft:/opt/sge
>>>>>>>>>>>> /opt/sge nfs
>>>>>>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>>>>>>>> machines,
>>>>>>>>>>>> thus all user should be able to create a directory inside.
>>>>>>>>>>>
>>>>>>>>>>> All access checkings will be applied:
>>>>>>>>>>>
>>>>>>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>>>>>>> - the one from the export (this seems to be fine)
>>>>>>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/
>>>>>>>>>>> fstab)
>>>>>>>>>>>
>>>>>>>>>>>> The issue seems somehow related to the session directory
>>>>>>>>>>>> created
>>>>>>>>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q
>>>>>>>>>>>> for
>>>>>>>>>>>> example for the job 29 on queue smp8.q. This
>>>>>>>>>>>> subdirectory of
>>>>>>>>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>>>>>>>>> permissions... which in turn forbids
>>>>>>>>>>>
>>>>>>>>>>> Did you try to run some simple jobs before the parallel
>>>>>>>>>>> ones -
>>>>>>>>>>> are these working? The daemons (qmaster and execd) were
>>>>>>>>>>> started
>>>>>>>>>>> as root?
>>>>>>>>>>>
>>>>>>>>>>> The user is known on the file server, i.e. the machine
>>>>>>>>>>> hosting
>>>>>>>>>>> /opt/sge?
>>>>>>>>>>>
>>>>>>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>>>>>>> nobody:nogroup credentials).
>>>>>>>>>>>
>>>>>>>>>>> In SGE the master process (the one running the job
>>>>>>>>>>> script) will
>>>>>>>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each
>>>>>>>>>>> started qrsh
>>>>>>>>>>> inside SGE - all with the same name. What is your
>>>>>>>>>>> definition of
>>>>>>>>>>> the PE in SGE which you use?
>>>>>>>>>>>
>>>>>>>>>>> -- Reuti
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>>>>>>> haven't found anything related to nobody:nogroup
>>>>>>>>>>>> configuration
>>>>>>>>>>>> so far.
>>>>>>>>>>>>
>>>>>>>>>>>> Eloi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Reuti wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As for the error message received, there might be some
>>>>>>>>>>>>>> inconsistency:
>>>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>>>>>> eg_at_charlie_0" is the
>>>>>>>>>>>>>
>>>>>>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>>>>>>> (sometimes implemented as /scratch in a partition on
>>>>>>>>>>>>> its own)
>>>>>>>>>>>>> should be local on each node.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>>>>>>
>>>>>>>>>>>>> If you want to share /opt/sge/tmp, everyone must be
>>>>>>>>>>>>> able to
>>>>>>>>>>>>> write into this location. As for me it's working fine
>>>>>>>>>>>>> (with the
>>>>>>>>>>>>> local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> parent directory and
>>>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>>>>>> eg_at_charlie_0/53199/0/0"
>>>>>>>>>>>>>> is the subdirectory... not the other way around.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>>>>>> Creating a directory with such credentials sounds
>>>>>>>>>>>>>>> like a bug
>>>>>>>>>>>>>>> in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> But I would first check your SGE config as that just
>>>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>>>> sound right.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>>>>>>>> OpenMPI-1.3.3
>>>>>>>>>>>>>>>> (with gridengine compnent).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> During any job submission, SGE creates a session
>>>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>>>> in $TMPDIR, named after the job id and the computing
>>>>>>>>>>>>>>>> node
>>>>>>>>>>>>>>>> name. This session directory is created using nobody/
>>>>>>>>>>>>>>>> nogroup
>>>>>>>>>>>>>>>> credentials.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create
>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>> subdirectories:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error:
>>>>>>>>>>>>>>>> Unable to
>>>>>>>>>>>>>>>> create the sub-directory
>>>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>>>>>>>> eg_at_charlie_0) of
>>>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>>>>>>>>>>>>>>> ess_hnp_module.c
>>>>>>>>>>>>>>>> at line 273
>>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>>> -------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>>>> likely to abort. There are many reasons that a
>>>>>>>>>>>>>>>> parallel
>>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>>> -------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line
>>>>>>>>>>>>>>>> 132
>>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>>> -------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>>>> likely to abort. There are many reasons that a
>>>>>>>>>>>>>>>> parallel
>>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>>> -------------------------------------------------------
>>>>>>>>>>>>>>>> -------------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error
>>>>>>>>>>>>>>>> in file
>>>>>>>>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/
>>>>>>>>>>>>>>>> orterun.c at
>>>>>>>>>>>>>>>> line 473
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This seems very likely related to the permissions
>>>>>>>>>>>>>>>> set on
>>>>>>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like to know if someone might have experienced
>>>>>>>>>>>>>>>> the same
>>>>>>>>>>>>>>>> or a similar issue and if any solution was found.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> users mailing list
>>>>>>>>>> users_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>