Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] 1.7.4rc: yet another launch failure
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-01-22 22:36:23


Here is the offending code:

     /* get the name of the user */
    uid = getuid();
#ifdef HAVE_GETPWUID
    pwdent = getpwuid(uid);
#else
    pwdent = NULL;
#endif
    if (NULL != pwdent) {
        user = strdup(pwdent->pw_name);
    } else {
        orte_show_help("help-orte-runtime.txt",
                       "orte:session:dir:nopwname", true);
        return ORTE_ERR_OUT_OF_RESOURCE;
    }

Is it possible on this platform that you don't have getpwuid? I'm surprised at the code as we could just use the uid instead - not sure why this more stringent test was applied

On Jan 22, 2014, at 7:02 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

> On yet another test platform I see the following:
>
> $ mpirun -mca btl sm,self -np 1 examples/ring_c
> --------------------------------------------------------------------------
> Open MPI was unable to obtain the username in order to create a path
> for its required temporary directories. This type of error is usually
> caused by a transient failure of network-based authentication services
> (e.g., LDAP or NIS failure due to network congestion), but can also be
> an indication of system misconfiguration.
>
> Please consult your system administrator about these issues and try
> again.
> --------------------------------------------------------------------------
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/util/session_dir.c at line 380
> [biou2.rice.edu:30021] [[40214,0],0] ORTE_ERROR_LOG: Out of resource in file /home/phh1/SCRATCH/OMPI/openmpi-1.7-latest-linux-ppc32-xlc-11.1/openmpi-1.7.4rc2r30361/orte/mca/ess/hnp/ess_hnp_module.c at line 599
> --------------------------------------------------------------------------
> It looks like orte_init failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during orte_init; some of which are due to configuration or
> environment problems. This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
> orte_session_dir failed
> --> Returned value Out of resource (-2) instead of ORTE_SUCCESS
> --------------------------------------------------------------------------
>
>
> An "-np 2" run fails in the same manner.
> This is a production system and there is no problem with "whoami" or "id", leaving me doubting the explanation provided by the error message.
>
> [phh1_at_biou2 ~]$ whoami
> phh1
> [phh1_at_biou2 ~]$ id
> uid=44154(phh1) gid=2016(hpc) groups=2016(hpc),3803(hpcusers),3805(sshgw),3808(biou)
>
> The "ompi_info --all" output is attached.
> Please let me know what additional info is needed.
>
> -Paul
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> <biou2_info.txt.bz2>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel