Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI portability problems: debug info isn't helpful
From: Aleksej Saushev (asau_at_[hidden])
Date: 2008-09-29 13:28:50


  Hello!

I'm trying to build OpenMPI on NetBSD 4.99.72,
I'm getting next message either when I'm building in debug mode
or without it:

[asau.local:27880] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_rml_base_select failed
  --> Returned value -13 instead of ORTE_SUCCESS

--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  ompi_mpi_init: orte_init_stage1 failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (goodbye)
[asau.local:27880] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!

I tried to ktrace test application (I use benchmarks/skampi as found in pkgsrc
as test, first because of its availability, second, because it is assumed
to work, and it does work with MPICH2). I haven't diagnosed any obvious reason
for failure.

I built same OpenMPI with debug information (CLFLAGS+=-g) and tried to step
over "orte_rml_base_select", "orte_init_stage1" and around.
This didn't enlighten me either. Is there any design documentation?

I tried building the same OpenMPI 1.2.7 on test FreeBSD 6.3-STABLE
(snapshot date unknown) system. Except for passing explicit value
of NM=/usr/bin/nm (configure doesn't detect it, why? It should
find _BSD_ nm there), nothing has changed. Test application starts
fine there.

I tried to change various verbosity parameters in ~/.openmpi/mca-params.conf,
but in vain, I haven't succeded any additional messages, that could clarify.
Have I overlooked anything?

What have I missed in my diagnostic researches?

What can I do to make it work on NetBSD 4.99?

Given the fact that it words on FreeBSD 6, it should work,
but some subtle difference is depended upon.
I'd be very grateful for further directions.

-- 
HE CE3OH...