Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.4rc: yet another launch failure
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2014-01-22 22:54:21


Not lacking getpwuid():

[phh1_at_biou2 BLD]$ grep HAVE_GETPWUID */include/*_config.h
opal/include/opal_config.h:#define HAVE_GETPWUID 1

I also can't see why the quoted code could fail.
The following is working fine:

[phh1_at_biou2 BLD]$ cat q.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>
int main(void) {
   uid_t uid = getuid();
   printf("uid = %d\n", (int)uid);
   struct passwd *p = getpwuid(uid);
   if (p) printf("name = %s\n", p->pw_name);
   return 0;
}

[phh1_at_biou2 BLD]$ gcc -std=c99 q.c && ./a.out
uid = 44154
name = phh1

HOWEVER, building for ILP32 target (as in the reported failure) fails:

[phh1_at_biou2 BLD]$ gcc -m32 -std=c99 q.c && ./a.out
uid = 44154

So, I am going to guess that this *is* a system misconfiguration (maybe
missing the 32-bit foo.so for the appropriate nsswitch resolver?) just as
the error message said.

Sorry for the false alarm,
-Paul

On Wed, Jan 22, 2014 at 7:36 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Here is the offending code:
>
> /* get the name of the user */
> uid = getuid();
> #ifdef HAVE_GETPWUID
> pwdent = getpwuid(uid);
> #else
> pwdent = NULL;
> #endif
> if (NULL != pwdent) {
> user = strdup(pwdent->pw_name);
> } else {
> orte_show_help("help-orte-runtime.txt",
> "orte:session:dir:nopwname", true);
> return ORTE_ERR_OUT_OF_RESOURCE;
> }
>
> Is it possible on this platform that you don't have getpwuid? I'm
> surprised at the code as we could just use the uid instead - not sure why
> this more stringent test was applied
>
>
>
> On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
> On yet another test platform I see the following:
>
> $ mpirun -mca btl sm,self -np 1 examples/ring_c
> --------------------------------------------------------------------------
> Open MPI was unable to obtain the username in order to create a path
> for its required temporary directories. This type of error is usually
> caused by a transient failure of network-based authentication services
> (e.g., LDAP or NIS failure due to network congestion), but can also be
> an indication of system misconfiguration.
>
> Please consult your system administrator about these issues and try
> again.
> --------------------------------------------------------------------------
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in
> file
> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c
> at line 380
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in
> file
> /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c
> at line 599
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_session_dir failed
> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
>
>
> An "-np 2" run fails in the same manner.
> This is a production system and there is no problem with "whoami" or "id",
> leaving me doubting the explanation provided by the error message.
>
> [phh1_at_biou2 ~]$ whoami
> phh1
> [phh1_at_biou2 ~]$ id
> uid=44154(phh1) gid=2016(hpc)
> groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)
>
> The "ompi_info --all" output is attached.
> Please let me know what additional info is needed.
>
> -Paul
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <biou2_info.txt.bz2>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900