Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] [sge] tight-integration openmpi and sge: opal_os_dirpath_create failure
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-11-10 11:47:14


Creating a directory with such credentials sounds like a bug in SGE to
me...perhaps an SGE config issue?

Only thing you could do is tell OMPI to use some other directory as
the root for its session dir tree - check "mpirun -h", or ompi_info
for the required option.

But I would first check your SGE config as that just doesn't sound
right.

On Nov 10, 2009, at 9:40 AM, Eloi Gaudry wrote:

> Hi there,
>
> I'm experiencing some issues using GE6.2U4 and OpenMPI-1.3.3 (with
> gridengine compnent).
>
> During any job submission, SGE creates a session directory in
> $TMPDIR, named after the job id and the computing node name. This
> session directory is created using nobody/nogroup credentials.
>
> When using OpenMPI with tight-integration, opal creates different
> subdirectories in this session directory. The issue I'm facing now
> is that OpenMPI fails to create these subdirectories:
>
> [charlie:03882] opal_os_dirpath_create: Error: Unable to create the
> sub-directory (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
> eg_at_charlie_0) of (/opt/sge/tmp/25.1.smp8.q/openmpi-sessions-
> eg_at_charlie_0
> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
> openmpi-1.3.3/orte/util/session_dir.c at line 101
> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
> openmpi-1.3.3/orte/util/session_dir.c at line 425
> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
> file ../../../../../openmpi-1.3.3/orte/mca/ess/hnp/ess_hnp_module.c
> at line 273
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_session_dir failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in file ../../
> openmpi-1.3.3/orte/runtime/orte_init.c at line 132
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_ess_set_name failed
> --> Returned value Error (-1) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
> [charlie:03882] [[53199,0],0] ORTE_ERROR_LOG: Error in
> file ../../../../openmpi-1.3.3/orte/tools/orterun/orterun.c at line
> 473
>
> This seems very likely related to the permissions set on $TMPDIR.
>
> I'd like to know if someone might have experienced the same or a
> similar issue and if any solution was found.
>
> Thanks for your help,
> Eloi
>
>
>
>
> --
>
>
> Eloi Gaudry
>
> Free Field Technologies
> Axis Park Louvain-la-Neuve
> Rue Emile Francqui, 1
> B-1435 Mont-Saint Guibert
> BELGIUM
>
> Company Phone: +32 10 487 959
> Company Fax: +32 10 454 626
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users