Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] init failing
From: Dominik Táborský (bremby_at_[hidden])
Date: 2009-08-08 07:37:22


Jeff Squyres píše v Pá 07. 08. 2009 v 15:24 +0200:
> I'm way behind on my mail; apologies for the delay in replying.

Totally OK. At least you replied at all! :-)

> Did you figure this out?
>
Unfortunately no. Lately I've been a bit busy so I didn't work on it.
Now I'm getting loose again :-)

> As a pure guess, it sounds like you have a heterogeneous setup --
> nodes have different distros and/or versions. As such, your glibc's
> may be different, etc. In such situations, it is definitely
> recommended to have a separate installation of Open MPI *on each node*
> (i.e., compiled/built for that distro/version/platform).

Problem is, there are no distros :)
Every system is built from sources and Ubuntu's Glibc. As I described
before, I'm using Ubuntu's Glibc to make things easy and not to compile
them (that would mean creating a new toolchain and since this approach
worked for me with another project, there's no obvious reason why to
compile Glibc). Everything else is compile from sources (eg. kernel,
bash, utilities, runit, everything but Glibc libraries).

Each system is the same but for hostname and SSH keys. They boot up over
network.

> If you're copying the files from system A to system B and A and B are
> different distros/versions, it could be a good reason why it fails to
> work.

So you recommend to compile OpenMPI. Ok, I'll get on it, hopefully it
won't be such a headache like with Glibc :-)

>
> Hope that helps.

In any case, I appreaciate your time and patience!

Dominik - bremby

>
> On Jul 28, 2009, at 4:07 AM, Dominik Táborský wrote:
>
> > Hi everyone,
> >
> > I am trying to build my own system for my nodes - minimalistic. I
> > tried
> > to make things easy so I didn't even recompile openMPI for it, I just
> > copied everything from my Ubuntu installation (I know, it's very
> > dirty,
> > but I stick to KISS :) ). Before, things just worked perfectly with
> > the
> > libraries. I only recompile executable binaries, not Glibc (not
> > openMPI,
> > I also didn't succeed compiling openMPI but that's a different story).
> >
> > So, as I keep trying to run Hello world! program, I keep getting the
> > same error message every time. Everything in the system is fine from
> > my
> > point of view. The error message is this:
> >
> > [user:24307] mca: base: components_open: component timer / linux open
> > function failed
> > --------------------------------------------------------------------------
> > It looks like opal_init failed for some reason; your parallel
> > process is
> > likely to abort. There are many reasons that a parallel process can
> > fail during opal_init; some of which are due to configuration or
> > environment problems. This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> >
> > opal_carto_base_select failed
> > --> Returned value -13 instead of OPAL_SUCCESS
> > --------------------------------------------------------------------------
> > [user:24307] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> > file ../../../orte/runtime/orte_init.c at line 77
> > [user:24307] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in
> > file ../../../orte/orted/orted_main.c at line 315
> >
> > I tried googling and searching the archives, nothing gave me a hint.
> > What might be missing? Should I really try to recompile openMPI? What
> > needs to be on/off in the kernel? Any ideas?
> >
> > Thanks in advance,
> >
> > bremby
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>
>