Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] Open MPI v1.3.4rc4 is out
From: David Gunter (dog_at_[hidden])
Date: 2009-11-05 16:39:53


I, too, have tried various builds of the rc4 release. It's dying
during orterun.

Specifically, here's the call chain where things fall apart:

orterun -> orte_init -> opal_init -> opal_carto_base_select ->
mca_base_select

54 for (item = opal_list_get_first(components_available);
55 item != opal_list_get_end(components_available);
56 item = opal_list_get_next(item) ) {
57 cli = (mca_base_component_list_item_t *) item;
58 component = (mca_base_component_t *) cli->cli_component;

The code is failing on line #55, i.e. item must be getting set to the
end on the first pass through. The code then jumps to line #107 and
passes the NULL test there:

107 if (NULL == *best_component) {
108 opal_output_verbose(5, output_id,
109 "mca:base:select:(%5s) No component
selected!",
110 type_name);
111 /*
112 * Still close the non-selected components
113 */
114 mca_base_components_close(0, /* Pass 0 to keep this from
closing the output handle */
115 components_available,
116 NULL);
117 return OPAL_ERR_NOT_FOUND;
118 }

-david

--
David Gunter
HPC-3: Infrastructure Team
Los Alamos National Laboratory
Sam Gutierrez wrote:
 >   Hi All,
 >  I just built OMPI 1.3.4rc4 on one of our Roadrunner machines. When I
 >  try to launch a simple MPI job, I get the following:
 >  [rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking for
 >  carto components
 >  [rra011a.rr.lanl.gov:31601] mca: base: components_open: opening  
carto
 >  components
 >  [rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto
 >  components
 >  [rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component
 >  selected!
 >   
--------------------------------------------------------------------------
 >  It looks like opal_init failed for some reason; your parallel  
process is
 >  likely to abort. There are many reasons that a parallel process can
 >  fail during opal_init; some of which are due to configuration or
 >  environment problems. This failure appears to be an internal  
failure;
 >  here's some additional information (which may only be relevant to an
 >  Open MPI developer):
 >     opal_carto_base_select failed
 >     --> Returned value -13 instead of OPAL_SUCCESS
 >   
--------------------------------------------------------------------------
 >  [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
 >  found in file runtime/orte_init.c at line 77
 >  [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
 >  found in file orterun.c at line 541
 >  This may be an issue on our end regarding a runtime parameter that
 >  isn't set correctly. See attached. Please let me know if you need
 >  any more info.
 >  Thanks!
 >  --
Samuel K. Gutierrez
Los Alamos National Laboratory
On Nov 4, 2009, at 3:00 PM, Jeff Squyres wrote:
 > The latest-n-greatest is available here:
 >
 > http://www.open-mpi.org/software/ompi/v1.3/
 >
 > Please beat it up and look for problems!
 >
 > --
 > Jeff Squyres
 > jsquyres_at_[hidden]
 >
 > _______________________________________________
 > devel mailing list
 > devel_at_[hidden]
 > http://www.open-mpi.org/mailman/listinfo.cgi/devel