Interesting - still, I see no reason for OMPI to fail just because of that. We can run just fine with the uid, so I'll make things a little more flexible.

Thanks for tracking it down!

On Jan 22, 2014, at 7:54 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:

Not lacking getpwuid():

[phh1@biou2 BLD]$ grep HAVE_GETPWUID */include/*_config.h
opal/include/opal_config.h:#define HAVE_GETPWUID 1

I also can't see why the quoted code could fail.
The following is working fine:

[phh1@biou2 BLD]$ cat q.c
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <pwd.h>
int main(void) {
   uid_t uid = getuid();
   printf("uid = %d\n", (int)uid);
   struct passwd *p = getpwuid(uid); 
   if (p) printf("name = %s\n", p->pw_name);
   return 0;
}

[phh1@biou2 BLD]$ gcc -std=c99 q.c && ./a.out
uid = 44154
name = phh1

HOWEVER, building for ILP32 target (as in the reported failure) fails:

[phh1@biou2 BLD]$ gcc -m32 -std=c99 q.c && ./a.out
uid = 44154

So, I am going to guess that this *is* a system misconfiguration (maybe missing the 32-bit foo.so for the appropriate nsswitch resolver?) just as the error message said.

Sorry for the false alarm,
-Paul


On Wed, Jan 22, 2014 at 7:36 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Here is the offending code:

     /* get the name of the user */
    uid = getuid();
#ifdef HAVE_GETPWUID
    pwdent = getpwuid(uid);
#else
    pwdent = NULL;
#endif
    if (NULL != pwdent) {
        user = strdup(pwdent->pw_name);
    } else {
        orte_show_help("help-orte-runtime.txt",
                       "orte:session:dir:nopwname", true);
        return ORTE_ERR_OUT_OF_RESOURCE;
    }

Is it possible on this platform that you don't have getpwuid? I'm surprised at the code as we could just use the uid instead - not sure why this more stringent test was applied



On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:

On yet another test platform I see the following:

$ mpirun -mca btl sm,self -np 1 examples/ring_c
--------------------------------------------------------------------------
Open MPI was unable to obtain the username in order to create a path
for its required temporary directories.  This type of error is usually
caused by a transient failure of network-based authentication services
(e.g., LDAP or NIS failure due to network congestion), but can also be
an indication of system misconfiguration.

Please consult your system administrator about these issues and try
again.
--------------------------------------------------------------------------
[biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c at line 380
[biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c at line 599
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_session_dir failed
  --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
--------------------------------------------------------------------------


An "-np 2" run fails in the same manner.
This is a production system and there is no problem with "whoami" or "id", leaving me doubting the explanation provided by the error message.

[phh1@biou2 ~]$ whoami
phh1
[phh1@biou2 ~]$ id
uid=44154(phh1) gid=2016(hpc) groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)

The "ompi_info --all" output is attached.
Please let me know what additional info is needed.

-Paul

--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
<biou2_info.txt.bz2>_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel