Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Eloi Gaudry (eg_at_[hidden])
Date: 2009-11-10 13:01:15


Reuti,

I'm using "tmpdir" as a shared directory that contains the session
directories created during job submission, not for computing or local
storage. Doesn't the session directory (i.e. job_id.queue_name) need to
be shared among all computing nodes (at least the ones that would be
used with orted during the parallel computation) ?

All sequential job run fine, as no write operation is performed in
"tmpdir/session_directory".

All users are known on the computing nodes and the master node (with use
ldap authentication on all nodes).

As for the access checkings:
moe:~# ls -alrtd /opt/sge/tmp
drwxrwxrwx+ 2 sgeadmin fft 4096 2009-11-10 18:28 /opt/sge/tmp

And for the parallel environment configuration:
moe:~# qconf -sp round_robin
pe_name round_robin
slots 32
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $round_robin
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE

Thanks for your help,
Eloi

Reuti wrote:
> Am 10.11.2009 um 18:20 schrieb Eloi Gaudry:
>
>> Thanks for your help Reuti,
>>
>> I'm using a nfs-shared directory (/opt/sge/tmp), exported from the
>> master node to all others computing nodes.
>
> It's higly advisable to have the "tmpdir" local on each node. When you
> use "cd $TMPDIR" in your jobscript, all is done local on a node (when
> your application will just create the scratch file in your current
> working directory) which will speed up the computation and decrease
> the network traffic. Computing in as shared /opt/sge/tmp is like
> computing in each user's home directory.
>
> To avoid that any user can remove someone else's files, the "t" flag
> is set like for /tmp: drwxrwxrwt 14 root root 4096 2009-11-10 18:35 /tmp/
>
> Nevertheless:
>
>> with for /etc/export on server (named moe.fft): /opt/sge
>> 192.168.0.0/255.255.255.0(rw,sync,no_subtree_check)
>> /etc/fstab on
>> client:
>> moe.fft:/opt/sge
>> /opt/sge nfs rw,bg,soft,timeo=14, 0 0
>> Actually, the /opt/sge/tmp directory is 777 across all machines, thus
>> all user should be able to create a directory inside.
>
> All access checkings will be applied:
>
> - on the server: what is "ls -d /opt/sge/tmp" showing?
> - the one from the export (this seems to be fine)
> - the one on the node (i.e., how it's mounted: cat /etc/fstab)
>
>> The issue seems somehow related to the session directory created
>> inside /opt/sge/tmp, let's stay /opt/sge/tmp/29.1.smp8.q for example
>> for the job 29 on queue smp8.q. This subdirectory of /opt/sge/tmp is
>> created with nobody:nogroup drwxr-xr-x permissions... which in turn
>> forbids
>
> Did you try to run some simple jobs before the parallel ones - are
> these working? The daemons (qmaster and execd) were started as root?
>
> The user is known on the file server, i.e. the machine hosting /opt/sge?
>
>> OpenMPI to create its subtree inside (as OpenMPI won't use
>> nobody:nogroup credentials).
>
> In SGE the master process (the one running the job script) will create
> the /opt/sge/tmp/29.1.smp8.q and also each started qrsh inside SGE -
> all with the same name. What is your definition of the PE in SGE which
> you use?
>
> -- Reuti
>
>
>> Ad Ralph suggested, I checked the SGE configuration, but I haven't
>> found anything related to nobody:nogroup configuration so far.
>>
>> Eloi
>>
>>
>> Reuti wrote:
>>> Hi,
>>>
>>> Am 10.11.2009 um 17:55 schrieb Eloi Gaudry:
>>>
>>>> Thanks for your help Ralph, I'll double check that.
>>>>
>>>> As for the error message received, there might be some
>>>> inconsistency:
>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0" is the
>>>
>>> often /opt/sge is shared across the nodes, while the /tmp (sometimes
>>> implemented as /scratch in a partition on its own) should be local
>>> on each node.
>>>
>>> What is the setting of "tmpdir" in your queue definition?
>>>
>>> If you want to share /opt/sge/tmp, everyone must be able to write
>>> into this location. As for me it's working fine (with the local
>>> /tmp), I assume the nobody/nogroup comes from any squash-setting in
>>> the /etc/export of you master node.
>>>
>>> -- Reuti
>>>
>>>
>>>> parent directory and
>>>> "/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0/53199/0/0"
>>>> is the subdirectory... not the other way around.
>>>>
>>>> Eloi
>>>>
>>>>
>>>>
>>>> Ralph Castain wrote:
>>>>> Creating a directory with such credentials sounds like a bug in
>>>>> SGE to me...perhaps an SGE config issue?
>>>>>
>>>>> Only thing you could do is tell OMPI to use some other directory
>>>>> as the root for its session dir tree - check "mpirun -h", or
>>>>> ompi_info for the required option.
>>>>>
>>>>> But I would first check your SGE config as that just doesn't sound
>>>>> right.
>>>>>
>>>>> On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3
>>>>>> (with gridengine compnent).
>>>>>>
>>>>>> During any job submission, SGE creates a session directory in
>>>>>> $TMPDIR, named after the job id and the computing node name. This
>>>>>> session directory is created using nobody/nogroup credentials.
>>>>>>
>>>>>> When using OpenMPI with tight-integration, opal creates different
>>>>>> subdirectories in this session directory. The issue I'm facing
>>>>>> now is that OpenMPI fails to create these subdirectories:
>>>>>>
>>>>>> [charlie:03882] opal_os_dirpath_create: Error: Unable to create
>>>>>> the sub-directory
>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0) of
>>>>>> (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-eg_at_charlie_0
>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 101
>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>> ../../openmpi-1.3.3/orte/util/session_dir.c at line 425
>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>> ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c at
>>>>>> line 273
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> It looks like orte_init failed for some reason; your parallel
>>>>>> process is
>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal
>>>>>> failure;
>>>>>> here's some additional information (which may only be relevant to an
>>>>>> Open MPI developer):
>>>>>>
>>>>>> orte_session_dir failed
>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>> ../../openmpi-1.3.3/orte/runtime/orte_init.c at line 132
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> It looks like orte_init failed for some reason; your parallel
>>>>>> process is
>>>>>> likely to abort. There are many reasons that a parallel process can
>>>>>> fail during orte_init; some of which are due to configuration or
>>>>>> environment problems. This failure appears to be an internal
>>>>>> failure;
>>>>>> here's some additional information (which may only be relevant to an
>>>>>> Open MPI developer):
>>>>>>
>>>>>> orte_ess_set_name failed
>>>>>> --> Returned value Error (-1) instead of ORTE_SUCCESS
>>>>>> --------------------------------------------------------------------------
>>>>>>
>>>>>> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file
>>>>>> ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at line 473
>>>>>>
>>>>>> This seems very likely related to the permissions set on $TMPDIR.
>>>>>>
>>>>>> I'd like to know if someone might have experienced the same or a
>>>>>> similar issue and if any solution was found.
>>>>>>
>>>>>> Thanks for your help,
>>>>>> Eloi
>>>>>>
>>>>>>