Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open MPI v1.3.4rc4 is out
From: David Gunter (dog_at_[hidden])
Date: 2009-11-05 16:39:53


I, too, have tried various builds of the rc4 release. It's dying
during orterun.

Specifically, here's the call chain where things fall apart:

orterun -> orte_init -> opal_init -> opal_carto_base_select ->
mca_base_select

54 for (item = opal_list_get_first(components_available);
55 item != opal_list_get_end(components_available);
56 item = opal_list_get_next(item) ) {
57 cli = (mca_base_component_list_item_t *) item;
58 component = (mca_base_component_t *) cli->cli_component;

The code is failing on line #55, i.e. item must be getting set to the
end on the first pass through. The code then jumps to line #107 and
passes the NULL test there:

107 if (NULL == *best_component) {
108 opal_output_verbose(5, output_id,
109 "mca:base:select:(%5s) No component
selected!",
110 type_name);
111 /*
112 * Still close the non-selected components
113 */
114 mca_base_components_close(0, /* Pass 0 to keep this from
closing the output handle */
115 components_available,
116 NULL);
117 return OPAL_ERR_NOT_FOUND;
118 }

-david

--
David Gunter
HPC-3: Infrastructure Team
Los Alamos National Laboratory
Sam Gutierrez wrote:
 >   Hi All,
 >  I just built OMPI 1.3.4rc4 on one of our Roadrunner machines. When I
 >  try to launch a simple MPI job, I get the following:
 >  [rra011a.rr.lanl.gov:31601] mca: base: components_open: Looking for
 >  carto components
 >  [rra011a.rr.lanl.gov:31601] mca: base: components_open: opening  
carto
 >  components
 >  [rra011a.rr.lanl.gov:31601] mca:base:select: Auto-selecting carto
 >  components
 >  [rra011a.rr.lanl.gov:31601] mca:base:select:(carto) No component
 >  selected!
 >   
--------------------------------------------------------------------------
 >  It looks like opal_init failed for some reason; your parallel  
process is
 >  likely to abort. There are many reasons that a parallel process can
 >  fail during opal_init; some of which are due to configuration or
 >  environment problems. This failure appears to be an internal  
failure;
 >  here's some additional information (which may only be relevant to an
 >  Open MPI developer):
 >     opal_carto_base_select failed
 >     --> Returned value -13 instead of OPAL_SUCCESS
 >   
--------------------------------------------------------------------------
 >  [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
 >  found in file runtime/orte_init.c at line 77
 >  [rra011a.rr.lanl.gov:31601] [[INVALID],INVALID] ORTE_ERROR_LOG: Not
 >  found in file orterun.c at line 541
 >  This may be an issue on our end regarding a runtime parameter that
 >  isn't set correctly. See attached. Please let me know if you need
 >  any more info.
 >  Thanks!
 >  --
Samuel K. Gutierrez
Los Alamos National Laboratory
On Nov 4, 2009, at 3:00 PM, Jeff Squyres wrote:
 > The latest-n-greatest is available here:
 >
 > http://www.open-mpi.org/software/ompi/v1.3/
 >
 > Please beat it up and look for problems!
 >
 > --
 > Jeff Squyres
 > jsquyres_at_[hidden]
 >
 > _______________________________________________
 > devel mailing list
 > devel_at_[hidden]
 > http://www.open-mpi.org/mailman/listinfo.cgi/devel