Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 17:56:31


Am 10.11.2009 um 23:51 schrieb Reuti:

> Hi Eloi,
>
> Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:
>
>> I followed your advice and switched to a local "tmpdir" instead of
>> a share one. This solved the session directory issue, thanks for
>> your help !
>
> what user/group is no listed for the generated temporary
> directories (i.e. $TMPDIR)?

...is now listed ...

> -- Reuti
>
>> However, I cannot understand how the issue disappeared. Any input
>> would be welcome as I really like to understand how SGE/OpenMPI
>> could failed when using such a configuration (i.e. with a shared
>> "tmpdir").
>>
>> Eloi
>>
>>
>> On 10/11/2009 19:17, Eloi Gaudry wrote:
>>> Reuti,
>>>
>>> The acl here were just added when I tried to force the /opt/sge/
>>> tmp subdirectories to be 777 (which I did when I first
>>> encountered the error of subdirectories creation within OpenMPI).
>>> I don't think the info I'll provide will be meaningfull here:
>>>
>>> moe:~# getfacl /opt/sge/tmp
>>> getfacl: Removing leading '/' from absolute path names
>>> # file: opt/sge/tmp
>>> # owner: sgeadmin
>>> # group: fft
>>> user::rwx
>>> group::rwx
>>> mask::rwx
>>> other::rwx
>>> default:user::rwx
>>> default:group::rwx
>>> default:group:fft:rwx
>>> default:mask::rwx
>>> default:other::rwx
>>>
>>> I'll try to use a local directory instead of a shared one for
>>> "tmpdir". But as this issue seems somehow related to permissions,
>>> I don't know if this would eventually be the rigth solution.
>>>
>>> Thanks for your help,
>>> Eloi
>>>
>>> Reuti wrote:
>>>> Hi,
>>>>
>>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>>
>>>>> Reuti,
>>>>>
>>>>> I'm using "tmpdir" as a shared directory that contains the
>>>>> session directories created during job submission, not for
>>>>> computing or local storage. Doesn't the session directory (i.e.
>>>>> job_id.queue_name) need to be shared among all computing nodes
>>>>> (at least the ones that would be used with orted during the
>>>>> parallel computation) ?
>>>>
>>>> no. orted runs happily with local $TMPDIR on each and every
>>>> node. The $TMPDIRs are intended to be used by the user for any
>>>> temporary data for his job, as they are created and removed by
>>>> SGE automatically for every job for his convenience.
>>>>
>>>>
>>>>> All sequential job run fine, as no write operation is performed
>>>>> in "tmpdir/session_directory".
>>>>>
>>>>> All users are known on the computing nodes and the master node
>>>>> (with use ldap authentication on all nodes).
>>>>>
>>>>> As for the access checkings:
>>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>>
>>>> Aha, the + tells that there are some ACLs set:
>>>>
>>>> getfacl /opt/sge/tmp
>>>>
>>>>
>>>>> And for the parallel environment configuration:
>>>>> moe:~# qconf -sp round_robin
>>>>> pe_name round_robin
>>>>> slots 32
>>>>> user_lists NONE
>>>>> xuser_lists NONE
>>>>> start_proc_args /bin/true
>>>>> stop_proc_args /bin/true
>>>>> allocation_rule $round_robin
>>>>> control_slaves TRUE
>>>>> job_is_first_task FALSE
>>>>> urgency_slots min
>>>>> accounting_summary FALSE
>>>>
>>>> Okay, fine.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Thanks for your help,
>>>>> Eloi
>>>>>
>>>>> Reuti wrote:
>>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>>
>>>>>>> Thanks for your help Reuti,
>>>>>>>
>>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported
>>>>>>> from the master node to all others computing nodes.
>>>>>>
>>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>>> When you use "cd $TMPDIR" in your jobscript, all is done local
>>>>>> on a node (when your application will just create the scratch
>>>>>> file in your current working directory) which will speed up
>>>>>> the computation and decrease the network traffic. Computing in
>>>>>> as shared /opt/sge/tmp is like computing in each user's home
>>>>>> directory.
>>>>>>
>>>>>> To avoid that any user can remove someone else's files, the
>>>>>> "t" flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>>> 2009-11-10 18:35 /tmp/
>>>>>>
>>>>>> Nevertheless:
>>>>>>
>>>>>>> with for /etc/export on server (named moe.fft): /opt/
>>>>>>> sge 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>>> /etc/fstab on
>>>>>>> client: moe.fft:/opt/
>>>>>>> sge /opt/
>>>>>>> sge nfs
>>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>>> machines, thus all user should be able to create a directory
>>>>>>> inside.
>>>>>>
>>>>>> All access checkings will be applied:
>>>>>>
>>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>>> - the one from the export (this seems to be fine)
>>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>>
>>>>>>> The issue seems somehow related to the session directory
>>>>>>> created inside /opt/sge/tmp, let's stay /opt/sge/tmp/
>>>>>>> 29.1.smp8.q for example for the job 29 on queue smp8.q. This
>>>>>>> subdirectory of /opt/sge/tmp is created with nobody:nogroup
>>>>>>> drwxr-xr-x permissions... which in turn forbids
>>>>>>
>>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>>> are these working? The daemons (qmaster and execd) were
>>>>>> started as root?
>>>>>>
>>>>>> The user is known on the file server, i.e. the machine
>>>>>> hosting /opt/sge?
>>>>>>
>>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>>> nobody:nogroup credentials).
>>>>>>
>>>>>> In SGE the master process (the one running the job script)
>>>>>> will create the /opt/sge/tmp/29.1.smp8.q and also each
>>>>>> started qrsh inside SGE - all with the same name. What is your
>>>>>> definition of the PE in SGE which you use?
>>>>>>
>>>>>> -- Reuti
>>>>>>
>>>>>>
>>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>>> haven't found anything related to nobody:nogroup
>>>>>>> configuration so far.
>>>>>>>
>>>>>>> Eloi
>>>>>>>
>>>>>>>
>>>>>>> Reuti wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>>
>>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>>
>>>>>>>>> As for the error message received, there might be some
>>>>>>>>> inconsistency: "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>>> eg_at_charlie_0" is the
>>>>>>>>
>>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>>> (sometimes implemented as /scratch in a partition on its
>>>>>>>> own) should be local on each node.
>>>>>>>>
>>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>>
>>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>>> write into this location. As for me it's working fine (with
>>>>>>>> the local /tmp), I assume the nobody/nogroup comes from any
>>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>>
>>>>>>>> -- Reuti
>>>>>>>>
>>>>>>>>
>>>>>>>>> parent directory and "/opt/sge/tmp/25.1.smp8.q/openmpi-
>>>>>>>>> sessions-eg_at_charlie_0/53199/0/0" is the subdirectory... not
>>>>>>>>> the other way around.
>>>>>>>>>
>>>>>>>>> Eloi
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ralph Castain wrote:
>>>>>>>>>> Creating a directory with such credentials sounds like a
>>>>>>>>>> bug in SGE to me...perhaps an SGE config issue?
>>>>>>>>>>
>>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>>
>>>>>>>>>> But I would first check your SGE config as that just
>>>>>>>>>> doesn't sound right.
>>>>>>>>>>
>>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi there,
>>>>>>>>>>>
>>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>>> OpenMPI-1.3.3 (with gridengine compnent).
>>>>>>>>>>>
>>>>>>>>>>> During any job submission, SGE creates a session
>>>>>>>>>>> directory in $TMPDIR, named after the job id and the
>>>>>>>>>>> computing node name. This session directory is created
>>>>>>>>>>> using nobody/nogroup credentials.
>>>>>>>>>>>
>>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create
>>>>>>>>>>> these subdirectories:
>>>>>>>>>>>
>>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>>> create the sub-directory (/opt/sge/tmp/25.1.smp8.q/
>>>>>>>>>>> openmpi-sessions-eg_at_charlie_0) of (/opt/sge/tmp/
>>>>>>>>>>> 25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>> file ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>> file ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>> file ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>>>>>>>>>> ess_hnp_module.c at line 273
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>> parallel process is
>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>> process can
>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>> configuration or
>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>> internal failure;
>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>> relevant to an
>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>
>>>>>>>>>>> orte_session_dir failed
>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>> file ../../openmpi-1.3.3/orte/runtime/orte_init.c at line
>>>>>>>>>>> 132
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>>> parallel process is
>>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>>> process can
>>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>>> configuration or
>>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>>> internal failure;
>>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>>> relevant to an
>>>>>>>>>>> Open MPI developer):
>>>>>>>>>>>
>>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>>> ------------------------------------------------------------
>>>>>>>>>>> --------------
>>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>>> file ../../../../openmpi-1.3.3/orte/tools/orterun/
>>>>>>>>>>> orterun.c at line 473
>>>>>>>>>>>
>>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>>> $TMPDIR.
>>>>>>>>>>>
>>>>>>>>>>> I'd like to know if someone might have experienced the
>>>>>>>>>>> same or a similar issue and if any solution was found.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your help,
>>>>>>>>>>> Eloi
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users