Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI portability problems: debug info isn't helpful
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-10-11 08:39:34


On Oct 11, 2008, at 6:48 AM, Aleksej Saushev wrote:

> The actual message states:
>
> [asau.local:25752] [NO-NAME] ORTE_ERROR_LOG: Not found in file
> runtime/orte_init_stage1.c at line 182
> --------------------------------------------------------------------------

Hmm. Even with all your output, I still don't see what could be
causing this -- the oob rml plugin was compiled and installed just
fine. Do you see an oob rml line in the output of ompi_info?

Is there a chance that there's some dependent library of oob_rml that
is available on your head/build node, but not available on your back-
end nodes? (that would be pretty odd, though)

Bummer -- it looks like we have a bug in the debugging output for when
rml plugins are selected -- so I can't just give you an mpirun command
line that will output some additional diagnostic information. Do you
mind getting your hands dirty in a little code? If so, edit this
file: orte/mca/rml/base/rml_base_select.c and change all instances of

    opal_output_verbose(xxx, orte_rml_base.rml_output, ...)
to
    opaL_output(orte_rml_base.rml_output, ...)

And then compile/install that with (this is a shortcut; of course, you
can do a top-level "make install" to install it, but it's a bit
overkill for what we need for this bit):

    cd orte/rml
    make
    cd ../..
    make install-am

Then run with:

    mpirun --mca rml_base_debug 100 ...

And see what the output tells you. When I do this with a successful
run, my output looks like this:

----
[5:38] svbu-mpi:~/mpi % mpirun -np 1 --mca rml_base_debug 100 hello
[svbu-mpi.cisco.com:02087] orte_rml_base_select: initializing rml  
component oob
[svbu-mpi030:10587] orte_rml_base_select: initializing rml component oob
stdout: Hello, world!  I am 0 of 1 (svbu-mpi030)
stderr: Hello, world!  I am 0 of 1 (svbu-mpi030)
[5:39] svbu-mpi:~/mpi %
-----
(my "hello" program simply prints out the hello world message on both  
stdout/stderr)
> Additional information.
>
> pkgsrc framework does work correctly here, it even catches or
> overrides some incompatibilities, when building OpenMPI from the
> same tarball without pkgsrc framework, I get this:
>
> libtool: compile:  gcc -DHAVE_CONFIG_H -I. -I../../../../opal/ 
> include -I../../../../orte/include -I../../../../ompi/include - 
> I../../../.. -O3 -DNDEBUG -finline-functions -fno-strict-aliasing - 
> pthread -MT backtrace_none_component.lo -MD -MP -MF .deps/ 
> backtrace_none_component.Tpo -c backtrace_none_component.c  -fPIC - 
> DPIC -o .libs/backtrace_none_component.o
> backtrace_none_component.c:41: error: expected expression before ','  
> token
> backtrace_none_component.c:51: warning: braces around scalar  
> initializer
> backtrace_none_component.c:51: warning: (near initialization for  
> 'mca_backtrace_none_component 
> .backtracec_version.mca_component_release_version')
That's also odd.  I don't see any problems in the source code in this  
particular area.  What is the output of this area of the code when  
compiled with -E?  It should show some obvious problem.
-- 
Jeff Squyres
Cisco Systems