Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 18:26:33


To avoid misunderstandings:

Am 11.11.2009 um 00:19 schrieb Eloi Gaudry:

> On any execution node, creating a subdirectory of /opt/sge/tmp
> (i.e. creating a session directory inside $TMPDIR) results in a new
> directory own by the user/group that submitted the job (not nobody/
> nogroup).

$TMPDIR is in this case /opt/sge/tmp/<joib_id>.<task_id>.<qname>

I really meant to create a directory in /opt/sge/tmp by hand with
mkdir, but on the execution node which mounts /opt/sge.

-- Reuti

> If I switch back to a shared /opt/sge/tmp directory, all session
> directories created by sge got nobody/nogroup as owner.
>
> Eloi
>
> On 11/11/2009 00:14, Reuti wrote:
>> Am 11.11.2009 um 00:03 schrieb Eloi Gaudry:
>>
>>> The user/group used to generate the temporary directories was
>>> nobody/nogroup, when using a shared $tmpdir.
>>> Now that I'm using a local $tmpdir (one for each node, not
>>> distributed over nfs), the right credentials (i.e. my username/
>>> groupname) are used to create the session directory inside
>>> $tmpdir, which in turn allows OpenMPI to successfully create its
>>> session subdirectories.
>>
>> Aha, this explains why it's working now - so it's not an SGE issue
>> IMHO.
>>
>> Question: when a user on the execution node goes to /opt/sge/tmp
>> and creates a directory on the command line with mkdir: what group/
>> user is used then?
>>
>> -- Reuti
>>
>>
>>> Eloi
>>>
>>>
>>> On 10/11/2009 23:51, Reuti wrote:
>>>> Hi Eloi,
>>>>
>>>> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>>>>
>>>>> I followed your advice and switched to a local "tmpdir" instead
>>>>> of a
>>>>> share one. This solved the session directory issue, thanks for
>>>>> your
>>>>> help !
>>>>
>>>> what user/group is no listed for the generated temporary
>>>> directories
>>>> (i.e. $TMPDIR)?
>>>>
>>>> -- Reuti
>>>>
>>>>> However, I cannot understand how the issue disappeared. Any input
>>>>> would be welcome as I really like to understand how SGE/OpenMPI
>>>>> could
>>>>> failed when using such a configuration (i.e. with a shared
>>>>> "tmpdir").
>>>>>
>>>>> Eloi
>>>>>
>>>>>
>>>>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>>>>> Reuti,
>>>>>>
>>>>>> The acl here were just added when I tried to force the /opt/
>>>>>> sge/tmp
>>>>>> subdirectories to be 777 (which I did when I first encountered
>>>>>> the
>>>>>> error of subdirectories creation within OpenMPI). I don't
>>>>>> think the
>>>>>> info I'll provide will be meaningfull here:
>>>>>>
>>>>>> moe:~# getfacl /opt/sge/tmp
>>>>>> getfacl: Removing leading '/' from absolute path names
>>>>>> # file: opt/sge/tmp
>>>>>> # owner: sgeadmin
>>>>>> # group: fft
>>>>>> user::rwx
>>>>>> group::rwx
>>>>>> mask::rwx
>>>>>> other::rwx
>>>>>> default:user::rwx
>>>>>> default:group::rwx
>>>>>> default:group:fft:rwx
>>>>>> default:mask::rwx
>>>>>> default:other::rwx
>>>>>>
>>>>>> I'll try to use a local directory instead of a shared one for
>>>>>> "tmpdir". But as this issue seems somehow related to
>>>>>> permissions, I
>>>>>> don't know if this would eventually be the rigth solution.
>>>>>>
>>>>>> Thanks for your help,
>>>>>> Eloi
>>>>>>
>>>>>> Reuti wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>>>>
>>>>>>>> Reuti,
>>>>>>>>
>>>>>>>> I'm using "tmpdir" as a shared directory that contains the
>>>>>>>> session
>>>>>>>> directories created during job submission, not for computing or
>>>>>>>> local storage. Doesn't the session directory (i.e.
>>>>>>>> job_id.queue_name) need to be shared among all computing
>>>>>>>> nodes (at
>>>>>>>> least the ones that would be used with orted during the
>>>>>>>> parallel
>>>>>>>> computation) ?
>>>>>>>
>>>>>>> no. orted runs happily with local $TMPDIR on each and every
>>>>>>> node.
>>>>>>> The $TMPDIRs are intended to be used by the user for any
>>>>>>> temporary
>>>>>>> data for his job, as they are created and removed by SGE
>>>>>>> automatically for every job for his convenience.
>>>>>>>
>>>>>>>
>>>>>>>> All sequential job run fine, as no write operation is
>>>>>>>> performed in
>>>>>>>> "tmpdir/session_directory".
>>>>>>>>
>>>>>>>> All users are known on the computing nodes and the master node
>>>>>>>> (with use ldap authentication on all nodes).
>>>>>>>>
>>>>>>>> As for the access checkings:
>>>>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>>>>
>>>>>>> Aha, the + tells that there are some ACLs set:
>>>>>>>
>>>>>>> getfacl /opt/sge/tmp
>>>>>>>
>>>>>>>
>>>>>>>> And for the parallel environment configuration:
>>>>>>>> moe:~# qconf -sp round_robin
>>>>>>>> pe_name round_robin
>>>>>>>> slots 32
>>>>>>>> user_lists NONE
>>>>>>>> xuser_lists NONE
>>>>>>>> start_proc_args /bin/true
>>>>>>>> stop_proc_args /bin/true
>>>>>>>> allocation_rule $round_robin
>>>>>>>> control_slaves TRUE
>>>>>>>> job_is_first_task FALSE
>>>>>>>> urgency_slots min
>>>>>>>> accounting_summary FALSE
>>>>>>>
>>>>>>> Okay, fine.
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> Thanks for your help,
>>>>>>>> Eloi
>>>>>>>>
>>>>>>>> Reuti wrote:
>>>>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>>>>
>>>>>>>>>> Thanks for your help Reuti,
>>>>>>>>>>
>>>>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported
>>>>>>>>>> from
>>>>>>>>>> the master node to all others computing nodes.
>>>>>>>>>
>>>>>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done
>>>>>>>>> local on
>>>>>>>>> a node (when your application will just create the scratch
>>>>>>>>> file
>>>>>>>>> in your current working directory) which will speed up the
>>>>>>>>> computation and decrease the network traffic. Computing in as
>>>>>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>>>>>> directory.
>>>>>>>>>
>>>>>>>>> To avoid that any user can remove someone else's files, the
>>>>>>>>> "t"
>>>>>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>>>>
>>>>>>>>> Nevertheless:
>>>>>>>>>
>>>>>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>>>>> /etc/fstab on
>>>>>>>>>> client:
>>>>>>>>>> moe.fft:/opt/sge
>>>>>>>>>> /opt/sge nfs
>>>>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>>>>>> machines,
>>>>>>>>>> thus all user should be able to create a directory inside.
>>>>>>>>>
>>>>>>>>> All access checkings will be applied:
>>>>>>>>>
>>>>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>>>>> - the one from the export (this seems to be fine)
>>>>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>>>>>
>>>>>>>>>> The issue seems somehow related to the session directory
>>>>>>>>>> created
>>>>>>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for
>>>>>>>>>> example for the job 29 on queue smp8.q. This subdirectory of
>>>>>>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>>>>>>> permissions... which in turn forbids
>>>>>>>>>
>>>>>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>>>>>> are these working? The daemons (qmaster and execd) were
>>>>>>>>> started
>>>>>>>>> as root?
>>>>>>>>>
>>>>>>>>> The user is known on the file server, i.e. the machine hosting
>>>>>>>>> /opt/sge?
>>>>>>>>>
>>>>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>>>>> nobody:nogroup credentials).
>>>>>>>>>
>>>>>>>>> In SGE the master process (the one running the job script)
>>>>>>>>> will
>>>>>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started
>>>>>>>>> qrsh
>>>>>>>>> inside SGE - all with the same name. What is your
>>>>>>>>> definition of
>>>>>>>>> the PE in SGE which you use?
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>>>>> haven't found anything related to nobody:nogroup
>>>>>>>>>> configuration
>>>>>>>>>> so far.
>>>>>>>>>>
>>>>>>>>>> Eloi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Reuti wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>>>>
>>>>>>>>>>>> As for the error message received, there might be some
>>>>>>>>>>>> inconsistency:
>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0"
>>>>>>>>>>>> is the
>>>>>>>>>>>
>>>>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>>>>> (sometimes implemented as /scratch in a partition on its
>>>>>>>>>>> own)
>>>>>>>>>>> should be local on each node.
>>>>>>>>>>>
>>>>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>>>>
>>>>>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>>>>>> write into this location. As for me it's working fine
>>>>>>>>>>> (with the
>>>>>>>>>>> local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>>>>
>>>>>>>>>>> -- Reuti
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> parent directory and
>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>>>> eg_at_charlie_0/53199/0/0"
>>>>>>>>>>>> is the subdirectory... not the other way around.
>>>>>>>>>>>>
>>>>>>>>>>>> Eloi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>>>> Creating a directory with such credentials sounds like
>>>>>>>>>>>>> a bug
>>>>>>>>>>>>> in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But I would first check your SGE config as that just
>>>>>>>>>>>>> doesn't
>>>>>>>>>>>>> sound right.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>>>>>> OpenMPI-1.3.3
>>>>>>>>>>>>>> (with gridengine compnent).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> During any job submission, SGE creates a session
>>>>>>>>>>>>>> directory
>>>>>>>>>>>>>> in $TMPDIR, named after the job id and the computing node
>>>>>>>>>>>>>> name. This session directory is created using nobody/
>>>>>>>>>>>>>> nogroup
>>>>>>>>>>>>>> credentials.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create
>>>>>>>>>>>>>> these
>>>>>>>>>>>>>> subdirectories:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>>>>>> create the sub-directory
>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>>>>>> eg_at_charlie_0) of
>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>>>>>>>>>>>>> ess_hnp_module.c
>>>>>>>>>>>>>> at line 273
>>>>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>>>>> -----------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>>>>> -----------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>>>>> -----------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>> ---------------------------------------------------------
>>>>>>>>>>>>>> -----------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>>>>> file
>>>>>>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at
>>>>>>>>>>>>>> line 473
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'd like to know if someone might have experienced the
>>>>>>>>>>>>>> same
>>>>>>>>>>>>>> or a similar issue and if any solution was found.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>