Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] More OpenMPI errors: how to debug?
From: Jim Kusznir (jkusznir_at_[hidden])
Date: 2008-05-23 16:09:05


Well, it turns out that the path OpenMPI looks for things seems at
least partially hard-coded. I've got some "wierd pathing" here on my
rocks cluster:

/opt is local;
/share/apps is exported from the headnode and available on all nodes.
On the head node, /opt is symlinked to /share/apps

I set my environment modules such that openmpi-1.2.6 is located in
/share/apps/openmpi-pgi/1.2.6. However, when I ran it on a compute
node, it ran into that error. When I installed the runtime directly
on the compute node (placing it in /opt), but still left the
module/pathing the same, it worked. I am thinking about making /opt a
symlink across the cluster, but I'm not sure about all the
implications therein...

--Jim

On Fri, May 23, 2008 at 12:07 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On May 22, 2008, at 12:52 PM, Jim Kusznir wrote:
>
>> I installed openmpi 1.2.6 on my system, but now my users are
>> complaining about even more errors. I'm getting this:
>>
>> [compute-0-23.local:26164] [NO-NAME] ORTE_ERROR_LOG: Not found in file
>> runtime/orte_init_stage1.c at line 182
>> --------------------------------------------------------------------------
>> Sorry! You were supposed to get help about:
>> orte_init:startup:internal-failure
>> from the file:
>> help-orte-runtime
>> But I couldn't find any file matching that name. Sorry!
>> --------------------------------------------------------------------------
>
> Everything below this message is a consequence of the first message
> (above).
>
> There's two problems here:
>
> 1. Where are the help files -- why can't OMPI find them? That's
> really weird; it suggests a broken Open MPI install. You have a few
> pending e-mails to me about RPM builds that I need to go read (I'm
> sorry; I'm way backed up :-( ); I wonder if this is somehow related...?
>
> 2. The specific error that is occurring is that the ORTE layer in OMPI
> is unable to initialize its out-of-band messaging system (we call it
> the "RML") which is *really* weird. The only reason that I can think
> that that would occur is a broken OMPI install.
>
> Is there any chance that there are some files missing from your OMPI
> installs? For example, do you see these two files under $prefix/lib/
> openmpi (or wherever $pkglibdir was set to):
>
> mca_rml_oob.la*
> mca_rml_oob.so*
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>