Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Eloi Gaudry (eg_at_[hidden])
Date: 2009-11-10 18:29:32


This is what I did (create by hand /opt/sge/tmp/test on an execution
host log as a regular cluster user).

Eloi

On 11/11/2009 00:26, Reuti wrote:
> To avoid misunderstandings:
>
> Am 11.11.2009 um 00:19 schrieb Eloi Gaudry:
>
>> On any execution node, creating a subdirectory of /opt/sge/tmp (i.e.
>> creating a session directory inside $TMPDIR) results in a new
>> directory own by the user/group that submitted the job (not
>> nobody/nogroup).
>
> $TMPDIR is in this case /opt/sge/tmp/<joib_id>.<task_id>.<qname>
>
> I really meant to create a directory in /opt/sge/tmp by hand with
> mkdir, but on the execution node which mounts /opt/sge.
>
> -- Reuti
>
>
>> If I switch back to a shared /opt/sge/tmp directory, all session
>> directories created by sge got nobody/nogroup as owner.
>>
>> Eloi
>>
>> On 11/11/2009 00:14, Reuti wrote:
>>> Am 11.11.2009 um 00:03 schrieb Eloi Gaudry:
>>>
>>>> The user/group used to generate the temporary directories was
>>>> nobody/nogroup, when using a shared $tmpdir.
>>>> Now that I'm using a local $tmpdir (one for each node, not
>>>> distributed over nfs), the right credentials (i.e. my
>>>> username/groupname) are used to create the session directory inside
>>>> $tmpdir, which in turn allows OpenMPI to successfully create its
>>>> session subdirectories.
>>>
>>> Aha, this explains why it's working now - so it's not an SGE issue
>>> IMHO.
>>>
>>> Question: when a user on the execution node goes to /opt/sge/tmp and
>>> creates a directory on the command line with mkdir: what group/user
>>> is used then?
>>>
>>> -- Reuti
>>>
>>>
>>>> Eloi
>>>>
>>>>
>>>> On 10/11/2009 23:51, Reuti wrote:
>>>>> Hi Eloi,
>>>>>
>>>>> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>>>>>
>>>>>> I followed your advice and switched to a local "tmpdir" instead of a
>>>>>> share one. This solved the session directory issue, thanks for your
>>>>>> help !
>>>>>
>>>>> what user/group is no listed for the generated temporary directories
>>>>> (i.e. $TMPDIR)?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>> However, I cannot understand how the issue disappeared. Any input
>>>>>> would be welcome as I really like to understand how SGE/OpenMPI
>>>>>> could
>>>>>> failed when using such a configuration (i.e. with a shared
>>>>>> "tmpdir").
>>>>>>
>>>>>> Eloi
>>>>>>
>>>>>>
>>>>>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>>>>>> Reuti,
>>>>>>>
>>>>>>> The acl here were just added when I tried to force the /opt/sge/tmp
>>>>>>> subdirectories to be 777 (which I did when I first encountered the
>>>>>>> error of subdirectories creation within OpenMPI). I don't think the
>>>>>>> info I'll provide will be meaningfull here:
>>>>>>>
>>>>>>> moe:~# getfacl /opt/sge/tmp
>>>>>>> getfacl: Removing leading '/' from absolute path names
>>>>>>> # file: opt/sge/tmp
>>>>>>> # owner: sgeadmin
>>>>>>> # group: fft
>>>>>>> user::rwx
>>>>>>> group::rwx
>>>>>>> mask::rwx
>>>>>>> other::rwx
>>>>>>> default:user::rwx
>>>>>>> default:group::rwx
>>>>>>> default:group:fft:rwx
>>>>>>> default:mask::rwx
>>>>>>> default:other::rwx
>>>>>>>
>>>>>>> I'll try to use a local directory instead of a shared one for
>>>>>>> "tmpdir". But as this issue seems somehow related to permissions, I
>>>>>>> don't know if this would eventually be the rigth solution.
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>> Eloi
>>>>>>>
>>>>>>> Reuti wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>>>>>
>>>>>>>>> Reuti,
>>>>>>>>>
>>>>>>>>> I'm using "tmpdir" as a shared directory that contains the
>>>>>>>>> session
>>>>>>>>> directories created during job submission, not for computing or
>>>>>>>>> local storage. Doesn't the session directory (i.e.
>>>>>>>>> job_id.queue_name) need to be shared among all computing nodes
>>>>>>>>> (at
>>>>>>>>> least the ones that would be used with orted during the parallel
>>>>>>>>> computation) ?
>>>>>>>>
>>>>>>>> no. orted runs happily with local $TMPDIR on each and every node.
>>>>>>>> The $TMPDIRs are intended to be used by the user for any temporary
>>>>>>>> data for his job, as they are created and removed by SGE
>>>>>>>> automatically for every job for his convenience.
>>>>>>>>
>>>>>>>>
>>>>>>>>> All sequential job run fine, as no write operation is
>>>>>>>>> performed in
>>>>>>>>> "tmpdir/session_directory".
>>>>>>>>>
>>>>>>>>> All users are known on the computing nodes and the master node
>>>>>>>>> (with use ldap authentication on all nodes).
>>>>>>>>>
>>>>>>>>> As for the access checkings:
>>>>>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>>>>>
>>>>>>>> Aha, the + tells that there are some ACLs set:
>>>>>>>>
>>>>>>>> getfacl /opt/sge/tmp
>>>>>>>>
>>>>>>>>
>>>>>>>>> And for the parallel environment configuration:
>>>>>>>>> moe:~# qconf -sp round_robin
>>>>>>>>> pe_name round_robin
>>>>>>>>> slots 32
>>>>>>>>> user_lists NONE
>>>>>>>>> xuser_lists NONE
>>>>>>>>> start_proc_args /bin/true
>>>>>>>>> stop_proc_args /bin/true
>>>>>>>>> allocation_rule $round_robin
>>>>>>>>> control_slaves TRUE
>>>>>>>>> job_is_first_task FALSE
>>>>>>>>> urgency_slots min
>>>>>>>>> accounting_summary FALSE
>>>>>>>>
>>>>>>>> Okay, fine.
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>> Thanks for your help,
>>>>>>>>> Eloi
>>>>>>>>>
>>>>>>>>> Reuti wrote:
>>>>>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>>>>>
>>>>>>>>>>> Thanks for your help Reuti,
>>>>>>>>>>>
>>>>>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from
>>>>>>>>>>> the master node to all others computing nodes.
>>>>>>>>>>
>>>>>>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done
>>>>>>>>>> local on
>>>>>>>>>> a node (when your application will just create the scratch file
>>>>>>>>>> in your current working directory) which will speed up the
>>>>>>>>>> computation and decrease the network traffic. Computing in as
>>>>>>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>>>>>>> directory.
>>>>>>>>>>
>>>>>>>>>> To avoid that any user can remove someone else's files, the "t"
>>>>>>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>>>>>
>>>>>>>>>> Nevertheless:
>>>>>>>>>>
>>>>>>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>>>>>> /etc/fstab on
>>>>>>>>>>> client:
>>>>>>>>>>> moe.fft:/opt/sge
>>>>>>>>>>> /opt/sge nfs
>>>>>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>>>>>>> machines,
>>>>>>>>>>> thus all user should be able to create a directory inside.
>>>>>>>>>>
>>>>>>>>>> All access checkings will be applied:
>>>>>>>>>>
>>>>>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>>>>>> - the one from the export (this seems to be fine)
>>>>>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>>>>>>
>>>>>>>>>>> The issue seems somehow related to the session directory
>>>>>>>>>>> created
>>>>>>>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for
>>>>>>>>>>> example for the job 29 on queue smp8.q. This subdirectory of
>>>>>>>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>>>>>>>> permissions... which in turn forbids
>>>>>>>>>>
>>>>>>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>>>>>>> are these working? The daemons (qmaster and execd) were started
>>>>>>>>>> as root?
>>>>>>>>>>
>>>>>>>>>> The user is known on the file server, i.e. the machine hosting
>>>>>>>>>> /opt/sge?
>>>>>>>>>>
>>>>>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>>>>>> nobody:nogroup credentials).
>>>>>>>>>>
>>>>>>>>>> In SGE the master process (the one running the job script) will
>>>>>>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started qrsh
>>>>>>>>>> inside SGE - all with the same name. What is your definition of
>>>>>>>>>> the PE in SGE which you use?
>>>>>>>>>>
>>>>>>>>>> -- Reuti
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>>>>>> haven't found anything related to nobody:nogroup configuration
>>>>>>>>>>> so far.
>>>>>>>>>>>
>>>>>>>>>>> Eloi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Reuti wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As for the error message received, there might be some
>>>>>>>>>>>>> inconsistency:
>>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0"
>>>>>>>>>>>>> is the
>>>>>>>>>>>>
>>>>>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>>>>>> (sometimes implemented as /scratch in a partition on its own)
>>>>>>>>>>>> should be local on each node.
>>>>>>>>>>>>
>>>>>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>>>>>
>>>>>>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>>>>>>> write into this location. As for me it's working fine (with
>>>>>>>>>>>> the
>>>>>>>>>>>> local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Reuti
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> parent directory and
>>>>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0/53199/0/0"
>>>>>>>>>>>>>
>>>>>>>>>>>>> is the subdirectory... not the other way around.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>>>>> Creating a directory with such credentials sounds like a bug
>>>>>>>>>>>>>> in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> But I would first check your SGE config as that just doesn't
>>>>>>>>>>>>>> sound right.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>>>>>>> OpenMPI-1.3.3
>>>>>>>>>>>>>>> (with gridengine compnent).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> During any job submission, SGE creates a session directory
>>>>>>>>>>>>>>> in $TMPDIR, named after the job id and the computing node
>>>>>>>>>>>>>>> name. This session directory is created using
>>>>>>>>>>>>>>> nobody/nogroup
>>>>>>>>>>>>>>> credentials.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create these
>>>>>>>>>>>>>>> subdirectories:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>>>>>>> create the sub-directory
>>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>>>>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> at line 273
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>>>>> parallel process is
>>>>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>>>>> process can
>>>>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>>>>> configuration or
>>>>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>>>>> internal failure;
>>>>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>>>>> relevant to an
>>>>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>>>>> --------------------------------------------------------------------------
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at
>>>>>>>>>>>>>>> line 473
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'd like to know if someone might have experienced the same
>>>>>>>>>>>>>>> or a similar issue and if any solution was found.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>>>>> Eloi
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> users mailing list
>>>>>>>>> users_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> users mailing list
>>>>>>>> users_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>