Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Reuti (reuti_at_[hidden])
Date: 2009-11-10 17:51:37


Hi Eloi,

Am 10.11.2009 um 23:42 schrieb Eloi Gaudry:

> I followed your advice and switched to a local "tmpdir" instead of
> a share one. This solved the session directory issue, thanks for
> your help !

what user/group is no listed for the generated temporary directories
(i.e. $TMPDIR)?

-- Reuti

> However, I cannot understand how the issue disappeared. Any input
> would be welcome as I really like to understand how SGE/OpenMPI
> could failed when using such a configuration (i.e. with a shared
> "tmpdir").
>
> Eloi
>
>
> On 10/11/2009 19:17, Eloi Gaudry wrote:
>> Reuti,
>>
>> The acl here were just added when I tried to force the /opt/sge/
>> tmp subdirectories to be 777 (which I did when I first encountered
>> the error of subdirectories creation within OpenMPI). I don't
>> think the info I'll provide will be meaningfull here:
>>
>> moe:~# getfacl /opt/sge/tmp
>> getfacl: Removing leading '/' from absolute path names
>> # file: opt/sge/tmp
>> # owner: sgeadmin
>> # group: fft
>> user::rwx
>> group::rwx
>> mask::rwx
>> other::rwx
>> default:user::rwx
>> default:group::rwx
>> default:group:fft:rwx
>> default:mask::rwx
>> default:other::rwx
>>
>> I'll try to use a local directory instead of a shared one for
>> "tmpdir". But as this issue seems somehow related to permissions,
>> I don't know if this would eventually be the rigth solution.
>>
>> Thanks for your help,
>> Eloi
>>
>> Reuti wrote:
>>> Hi,
>>>
>>> Am 10.11.2009 um 19:01 schrieb Eloi Gaudry:
>>>
>>>> Reuti,
>>>>
>>>> I'm using "tmpdir" as a shared directory that contains the
>>>> session directories created during job submission, not for
>>>> computing or local storage. Doesn't the session directory (i.e.
>>>> job_id.queue_name) need to be shared among all computing nodes
>>>> (at least the ones that would be used with orted during the
>>>> parallel computation) ?
>>>
>>> no. orted runs happily with local $TMPDIR on each and every node.
>>> The $TMPDIRs are intended to be used by the user for any
>>> temporary data for his job, as they are created and removed by
>>> SGE automatically for every job for his convenience.
>>>
>>>
>>>> All sequential job run fine, as no write operation is performed
>>>> in "tmpdir/session_directory".
>>>>
>>>> All users are known on the computing nodes and the master node
>>>> (with use ldap authentication on all nodes).
>>>>
>>>> As for the access checkings:
>>>> moe:~# ls -alrtd /opt/sge/tmp
>>>> drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp
>>>
>>> Aha, the + tells that there are some ACLs set:
>>>
>>> getfacl /opt/sge/tmp
>>>
>>>
>>>> And for the parallel environment configuration:
>>>> moe:~# qconf -sp round_robin
>>>> pe_name round_robin
>>>> slots 32
>>>> user_lists NONE
>>>> xuser_lists NONE
>>>> start_proc_args /bin/true
>>>> stop_proc_args /bin/true
>>>> allocation_rule $round_robin
>>>> control_slaves TRUE
>>>> job_is_first_task FALSE
>>>> urgency_slots min
>>>> accounting_summary FALSE
>>>
>>> Okay, fine.
>>>
>>> -- Reuti
>>>
>>>
>>>> Thanks for your help,
>>>> Eloi
>>>>
>>>> Reuti wrote:
>>>>> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>>>>>
>>>>>> Thanks for your help Reuti,
>>>>>>
>>>>>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from
>>>>>> the master node to all others computing nodes.
>>>>>
>>>>> It's higly advisable to have the "tmpdir" local on each node.
>>>>> When you use "cd $TMPDIR" in your jobscript, all is done local
>>>>> on a node (when your application will just create the scratch
>>>>> file in your current working directory) which will speed up the
>>>>> computation and decrease the network traffic. Computing in as
>>>>> shared /opt/sge/tmp is like computing in each user's home
>>>>> directory.
>>>>>
>>>>> To avoid that any user can remove someone else's files, the "t"
>>>>> flag is set like for /tmp: drwxrwxrwt 14 root root 4096
>>>>> 2009-11-10 18:35 /tmp/
>>>>>
>>>>> Nevertheless:
>>>>>
>>>>>> with for /etc/export on server (named moe.fft): /opt/sge
>>>>>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>>>>>> /etc/fstab on
>>>>>> client: moe.fft:/opt/
>>>>>> sge /opt/
>>>>>> sge nfs
>>>>>> rw,bg,soft,timeo=14, 0 0
>>>>>> Actually, the /opt/sge/tmp directory is 777 across all
>>>>>> machines, thus all user should be able to create a directory
>>>>>> inside.
>>>>>
>>>>> All access checkings will be applied:
>>>>>
>>>>> - on the server: what is "ls -d /opt/sge/tmp" showing?
>>>>> - the one from the export (this seems to be fine)
>>>>> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>>>>>
>>>>>> The issue seems somehow related to the session directory
>>>>>> created inside /opt/sge/tmp, let's stay /opt/sge/tmp/
>>>>>> 29.1.smp8.q for example for the job 29 on queue smp8.q. This
>>>>>> subdirectory of /opt/sge/tmp is created with nobody:nogroup
>>>>>> drwxr-xr-x permissions... which in turn forbids
>>>>>
>>>>> Did you try to run some simple jobs before the parallel ones -
>>>>> are these working? The daemons (qmaster and execd) were started
>>>>> as root?
>>>>>
>>>>> The user is known on the file server, i.e. the machine hosting /
>>>>> opt/sge?
>>>>>
>>>>>> OpenMPI to create its subtree inside (as OpenMPI won't use
>>>>>> nobody:nogroup credentials).
>>>>>
>>>>> In SGE the master process (the one running the job script) will
>>>>> create the /opt/sge/tmp/29.1.smp8.q and also each started qrsh
>>>>> inside SGE - all with the same name. What is your definition of
>>>>> the PE in SGE which you use?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> Ad Ralph suggested, I checked the SGE configuration, but I
>>>>>> haven't found anything related to nobody:nogroup configuration
>>>>>> so far.
>>>>>>
>>>>>> Eloi
>>>>>>
>>>>>>
>>>>>> Reuti wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>>>>>
>>>>>>>> Thanks for your help Ralph, I'll double check that.
>>>>>>>>
>>>>>>>> As for the error message received, there might be some
>>>>>>>> inconsistency: "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
>>>>>>>> eg_at_charlie_0" is the
>>>>>>>
>>>>>>> often /opt/sge is shared across the nodes, while the /tmp
>>>>>>> (sometimes implemented as /scratch in a partition on its own)
>>>>>>> should be local on each node.
>>>>>>>
>>>>>>> What is the setting of "tmpdir" in your queue definition?
>>>>>>>
>>>>>>> If you want to share /opt/sge/tmp, everyone must be able to
>>>>>>> write into this location. As for me it's working fine (with
>>>>>>> the local /tmp), I assume the nobody/nogroup comes from any
>>>>>>> squash-setting in the /etc/export of you master node.
>>>>>>>
>>>>>>> -- Reuti
>>>>>>>
>>>>>>>
>>>>>>>> parent directory and "/opt/sge/tmp/25.1.smp8.q/openmpi-
>>>>>>>> sessions-eg_at_charlie_0/53199/0/0" is the subdirectory... not
>>>>>>>> the other way around.
>>>>>>>>
>>>>>>>> Eloi
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Ralph Castain wrote:
>>>>>>>>> Creating a directory with such credentials sounds like a
>>>>>>>>> bug in SGE to me...perhaps an SGE config issue?
>>>>>>>>>
>>>>>>>>> Only thing you could do is tell OMPI to use some other
>>>>>>>>> directory as the root for its session dir tree - check
>>>>>>>>> "mpirun -h", or ompi_info for the required option.
>>>>>>>>>
>>>>>>>>> But I would first check your SGE config as that just
>>>>>>>>> doesn't sound right.
>>>>>>>>>
>>>>>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>>>>>
>>>>>>>>>> Hi there,
>>>>>>>>>>
>>>>>>>>>> I'm experiencing some issues using GE6.2U4 and
>>>>>>>>>> OpenMPI-1.3.3 (with gridengine compnent).
>>>>>>>>>>
>>>>>>>>>> During any job submission, SGE creates a session directory
>>>>>>>>>> in $TMPDIR, named after the job id and the computing node
>>>>>>>>>> name. This session directory is created using nobody/
>>>>>>>>>> nogroup credentials.
>>>>>>>>>>
>>>>>>>>>> When using OpenMPI with tight-integration, opal creates
>>>>>>>>>> different subdirectories in this session directory. The
>>>>>>>>>> issue I'm facing now is that OpenMPI fails to create these
>>>>>>>>>> subdirectories:
>>>>>>>>>>
>>>>>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to
>>>>>>>>>> create the sub-directory (/opt/sge/tmp/25.1.smp8.q/openmpi-
>>>>>>>>>> sessions-eg_at_charlie_0) of (/opt/sge/tmp/25.1.smp8.q/
>>>>>>>>>> openmpi-sessions-eg_at_charlie_0
>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>> file ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>> file ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>> file ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/
>>>>>>>>>> ess_hnp_module.c at line 273
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -------------
>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>> parallel process is
>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>> process can
>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>> configuration or
>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>> internal failure;
>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>> relevant to an
>>>>>>>>>> Open MPI developer):
>>>>>>>>>>
>>>>>>>>>> orte_session_dir failed
>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -------------
>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>> file ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -------------
>>>>>>>>>> It looks like orte_init failed for some reason; your
>>>>>>>>>> parallel process is
>>>>>>>>>> likely to abort. There are many reasons that a parallel
>>>>>>>>>> process can
>>>>>>>>>> fail during orte_init; some of which are due to
>>>>>>>>>> configuration or
>>>>>>>>>> environment problems. This failure appears to be an
>>>>>>>>>> internal failure;
>>>>>>>>>> here's some additional information (which may only be
>>>>>>>>>> relevant to an
>>>>>>>>>> Open MPI developer):
>>>>>>>>>>
>>>>>>>>>> orte_ess_set_name failed
>>>>>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>>>>>> -------------------------------------------------------------
>>>>>>>>>> -------------
>>>>>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
>>>>>>>>>> file ../../../../openmpi-1.3.3/orte/tools/orterun/
>>>>>>>>>> orterun.c at line 473
>>>>>>>>>>
>>>>>>>>>> This seems very likely related to the permissions set on
>>>>>>>>>> $TMPDIR.
>>>>>>>>>>
>>>>>>>>>> I'd like to know if someone might have experienced the
>>>>>>>>>> same or a similar issue and if any solution was found.
>>>>>>>>>>
>>>>>>>>>> Thanks for your help,
>>>>>>>>>> Eloi
>>>>>>>>>>
>>>>>>>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>