Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 18:14:10


Am 11.11.2009 um 00:03 schrieb Eloi Gaudry:

> The user/group used to generate the temporary directories was
> nobody/nogroup, when using a shared $tmpdir.
> Now that I'm using a local $tmpdir (one for each node, not
> distributed over nfs), the right credentials (i.e. my username/
> groupname) are used to create the session directory inside $tmpdir,
> which in turn allows OpenMPI to successfully create its session
> subdirectories.

Aha, this explains why it's working now - so it's not an SGE issue IMHO.

Question: when a user on the execution node goes to /opt/sge/tmp and
creates a directory on the command line with mkdir: what group/user
is used then?

-- Reuti

> Eloi
>
>
> On 10/11/2009 23:51, Reuti wrote:
>> Hi Eloi,
>>
>> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>>
>>> I followed your advice and switched to a local "tmpdir" instead of a
>>> share one. This solved the session directory issue, thanks for your
>>> help !
>>
>> what user/group is no listed for the generated temporary directories
>> (i.e. $TMPDIR)?
>>
>> -- Reuti
>>
>>> However, I cannot understand how the issue disappeared. Any input
>>> would be welcome as I really like to understand how SGE/OpenMPI
>>> could
>>> failed when using such a configuration (i.e. with a shared
>>> "tmpdir").
>>>
>>> Eloi
>>>
>>>
>>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>>> Reuti,
>>>>
>>>> The acl here were just added when I tried to force the /opt/sge/tmp
>>>> subdirectories to be 777 (which I did when I first encountered the
>>>> error of subdirectories creation within OpenMPI). I don't think the
>>>> info I'll provide will be meaningfull here:
>>>>
>>>> moe:~# getfacl /opt/sge/tmp
>>>> getfacl: Removing leading '/' from absolute path names
>>>> # file: opt/sge/tmp
>>>> # owner: sgeadmin
>>>> # group: fft
>>>> user::rwx
>>>> group::rwx
>>>> mask::rwx
>>>> other::rwx
>>>> default:user::rwx
>>>> default:group::rwx
>>>> default:group:fft:rwx
>>>> default:mask::rwx
>>>> default:other::rwx
>>>>
>>>> I'll try to use a local directory instead of a shared one for
>>>> "tmpdir". But as this issue seems somehow related to permissions, I
>>>> don't know if this would eventually be the rigth solution.
>>>>
>>>> Thanks for your help,
>>>> Eloi
>>>>
>>>> Reuti wrote:
>>>>> Hi,
>>>>>
>>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>>
>>>>>> Reuti,
>>>>>>
>>>>>> I'm using "tmpdir" as a shared directory that contains the
>>>>>> session
>>>>>> directories created during job submission, not for computing or
>>>>>> local storage. Doesn't the session directory (i.e.
>>>>>> job_id.queue_name) need to be shared among all computing nodes
>>>>>> (at
>>>>>> least the ones that would be used with orted during the parallel
>>>>>> computation) ?
>>>>>
>>>>> no. orted runs happily with local $TMPDIR on each and every node.
>>>>> The $TMPDIRs are intended to be used by the user for any temporary
>>>>> data for his job, as they are created and removed by SGE
>>>>> automatically for every job for his convenience.
>>>>>
>>>>>
>>>>>> All sequential job run fine, as no write operation is
>>>>>> performed in
>>>>>> "tmpdir/session_directory".
>>>>>>
>>>>>> All users are known on the computing nodes and the master node
>>>>>> (with use ldap authentication on all nodes).
>>>>>>
>>>>>> As for the access checkings:
>>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>>
>>>>> Aha, the + tells that there are some ACLs set:
>>>>>
>>>>> getfacl /opt/sge/tmp
>>>>>
>>>>>
>>>>>> And for the parallel environment configuration:
>>>>>> moe:~# qconf -sp round_robin
>>>>>> pe_name round_robin
>>>>>> slots 32
>>>>>> user_lists NONE
>>>>>> xuser_lists NONE
>>>>>> start_proc_args /bin/true
>>>>>> stop_proc_args /bin/true
>>>>>> allocation_rule $round_robin
>>>>>> control_slaves TRUE
>>>>>> job_is_first_task FALSE
>>>>>> urgency_slots min
>>>>>> accounting_summary FALSE
>>>>>
>>>>> Okay, fine.
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> Thanks for your help,
>>>>>> Eloi
>>>>>>
>>>>>> Reuti wrote:
>>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>>
>>>>>>>> Thanks for your help Reuti,
>>>>>>>>
>>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from
>>>>>>>> the master node to all others computing nodes.
>>>>>>>
>>>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done
>>>>>>> local on
>>>>>>> a node (when your application will just create the scratch file
>>>>>>> in your current working directory) which will speed up the
>>>>>>> computation and decrease the network traffic. Computing in as
>>>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>>>> directory.
>>>>>>>
>>>>>>> To avoid that any user can remove someone else's files, the "t"
>>>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>>
>>>>>>> Nevertheless:
>>>>>>>
>>>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>>> /etc/fstab on
>>>>>>>> client:
>>>>>>>> moe.fft:/opt/sge
>>>>>>>> /opt/sge nfs
>>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>>>> machines,
>>>>>>>> thus all user should be able to create a directory inside.
>>>>>>>
>>>>>>> All access checkings will be applied:
>>>>>>>
>>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>>> - the one from the export (this seems to be fine)
>>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>>>
>>>>>>>> The issue seems somehow related to the session directory
>>>>>>>> created
>>>>>>>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for
>>>>>>>> example for the job 29 on queue smp8.q. This subdirectory of
>>>>>>>> /opt/sge/tmp is created with nobody:nogroup drwxr-xr-x
>>>>>>>> permissions... which in turn forbids
>>>>>>>
>>>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>>>> are these working? The daemons (qmaster and execd) were started
>>>>>>> as root?
>>>>>>>
>>>>>>> The user is known on the file server, i.e. the machine hosting
>>>>>>> /opt/sge?
>>>>>>>
>>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>>> nobody:nogroup credentials).
>>>>>>>
>>>>>>> In SGE the master process (the one running the job script) will
>>>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started qrsh
>>>>>>> inside SGE - all with the same name. What is your definition of
>>>>>>> the PE in SGE which you use?
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>>> haven't found anything related to nobody:nogroup configuration
>>>>>>>> so far.
>>>>>>>>
>>>>>>>> Eloi
>>>>>>>>
>>>>>>>>
>>>>>>>> Reuti wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>>
>>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>>
>>>>>>>>>> As for the error message received, there might be some
>>>>>>>>>> inconsistency:
>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0"
>>>>>>>>>> is the
>>>>>>>>>
>>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>>> (sometimes implemented as /scratch in a partition on its own)
>>>>>>>>> should be local on each node.
>>>>>>>>>
>>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>>
>>>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>>>> write into this location. As for me it's working fine (with
>>>>>>>>> the
>>>>>>>>> local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>>
>>>>>>>>> -- Reuti
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> parent directory and
>>>>>>>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>>> eg_at_charlie_0/53199/0/0"
>>>>>>>>>> is the subdirectory... not the other way around.
>>>>>>>>>>
>>>>>>>>>> Eloi
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>>> Creating a directory with such credentials sounds like a bug
>>>>>>>>>>> in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>>
>>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>>
>>>>>>>>>>> But I would first check your SGE config as that just doesn't
>>>>>>>>>>> sound right.
>>>>>>>>>>>
>>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi there,
>>>>>>>>>>>>
>>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>>>> OpenMPI-1.3.3
>>>>>>>>>>>> (with gridengine compnent).
>>>>>>>>>>>>
>>>>>>>>>>>> During any job submission, SGE creates a session directory
>>>>>>>>>>>> in $TMPDIR, named after the job id and the computing node
>>>>>>>>>>>> name. This session directory is created using nobody/
>>>>>>>>>>>> nogroup
>>>>>>>>>>>> credentials.
>>>>>>>>>>>>
>>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create these
>>>>>>>>>>>> subdirectories:
>>>>>>>>>>>>
>>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>>>> create the sub-directory
>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>>>>>>>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>>>>>>>>>>> ess_hnp_module.c
>>>>>>>>>>>> at line 273
>>>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>> parallel process is
>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>> process can
>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>> configuration or
>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>> internal failure;
>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>> relevant to an
>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>
>>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>>> parallel process is
>>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>>> process can
>>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>>> configuration or
>>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>>> internal failure;
>>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>>> relevant to an
>>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>>
>>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>>> -----------------------------------------------------------
>>>>>>>>>>>> ---------------
>>>>>>>>>>>>
>>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>>>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at
>>>>>>>>>>>> line 473
>>>>>>>>>>>>
>>>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>>
>>>>>>>>>>>> I'd like to know if someone might have experienced the same
>>>>>>>>>>>> or a similar issue and if any solution was found.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>>> Eloi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>
>>
>