Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems configuring OpenMPI 1.3.1 with numa, torque, and openib
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-04-10 13:17:43


On Apr 9, 2009, at 6:16 PM, Gus Correa wrote:

> The configure scripts seem to have changed, and work different
> than before, particularly w.r.t. additional libraries like numa,
> torque, and openib.
> The new behavior can be a bit unexpected and puzzled me,
> although eventually I could build 1.3.1.
>

Yes, we put in this new functionality due to user requests. See below.

> Here are my observations.
>
> 1) I used to configure OpenMPI 1.2.8 and 1.3.0 with:
>
> --with-libnuma=/usr/lib64 \
> --with-tm=/usr/lib64 \
> --with-openib=/usr/lib64 \
>
> This worked fine for me on the same computer I am using for 1.3.1.
> However, with 1.3.1 the same options fail.
> Configure now tries to find the corresponding include
> files on /usr/lib64/include, a directory that doesn't even exist.
> The include files are actually in /usr/include
> (as the old configure knew well).
>

What happened in the 1.2.x configure was that OMPI was adding -L/usr/
lib64/lib and trying to find the relevant libraries. But since /usr/
lib64 was already in your linker's default search path, the relevant
libraries were found without any addition flags from OMPI.
Additionally, OMPI was also adding -I/usr/lib64/include to the compile
path, but the relevant header files were found because they were in
your compiler's default search path (likely /usr/include). So both
the added -I and -L flags were meaningless -- albeit harmless.

> 2) Therefore, I tried to configure with:
>
> --with-libnuma \
> --with-tm \
> --with-openib \
>
> Note that no directory is being pointed to.
> My hope was that configure would find the libraries and includes in
> standard places (and hopefully the correct libs, 64-bit, not 32-bit).
>

This *should* be fine.

> This way configure completes with no apparent error.
> However, I get this funny error on the make phase:
>
> /bin/sh ../../../../libtool --tag=CC --mode=link gcc -DNDEBUG
> -march=amdfam10 -O3 -finline-functions -funroll-loops -mfpmath=sse
> -fno-strict-aliasing -pthread -fvisibility=hidden -module -avoid-
> version
> -Lyes/lib -export-dynamic -o libmca_maffinity_libnuma.la
> maffinity_libnuma_component.lo maffinity_libnuma_module.lo -lnuma -
> lnsl
> -lutil -lm
> ../../../../libtool: line 4998: cd: yes/lib: No such file or directory
> libtool: link: cannot determine absolute directory name of `yes/lib'
> make[2]: *** [libmca_maffinity_libnuma.la] Error 1
>

Huh. That's odd.

> Note the "yes/lib" path.
>
> A little grep on config.log showed why the error:
>
> %grep yes config.log
>
> ...
>
> OMPI_WRAPPER_EXTRA_LDFLAGS=' -Lyes/lib '
> OPAL_WRAPPER_EXTRA_LDFLAGS='-Lyes/lib '
> ORTE_WRAPPER_EXTRA_LDFLAGS=' -Lyes/lib '
> WRAPPER_EXTRA_LDFLAGS=' -Lyes/lib '
> maffinity_libnuma_CPPFLAGS=' -Iyes/include'
> maffinity_libnuma_LDFLAGS=' -Lyes/lib'
> #define WRAPPER_EXTRA_LDFLAGS " -Lyes/lib "
>
> Is this an internal "yes" answer to configure that
> is being inadvertently caught/interpreted as a directory name?
>

Ah, crud. Probably so, yes.

(/me double checks libnuma's m4 setup... crud; I can replicate the
problem. I'll try to commit a fix this afternoon so that it can be
included in 1.3.2)

> Since configure seems to have found the libraries and include files,
> and completed without error,
> shouldn't it also have reported the correct paths to config.log
> and written them correctly to the Makefiles?
>
> 3) Finally I tried this:
>
> --with-libnuma=/usr \
> --with-tm=/usr \
> --with-openib=/usr \
>
> This approach was suggested by Prentice Bisbal a few days ago,
> when Francesco Pietra reported on this list
> having a similar problem with libnuma.
>
> This seems to work fine, and OpenMPI 1.3.1 builds.
>

Good. FWIW, you probably don't need to specify any of these. More
below.

Generally, unless you specify --without-<foo>, OMPI will look for
feature <foo> in the default paths. If the feature is found, then
OMPI uses it. If the feature is not found, OMPI just skips it.
Specifying --with-<foo> is supposed to indicate to OMPI's configure
"yes, I definitely want this feature" (regardless of whether you
specified a directory or not), meaning that if OMPI can't find that
feature, configure will abort on the rationale that you specifically
asked for something but we can't deliver it. So abort and let a human
figure it out.

> However, I have more questions:
>

Here's the general scheme that OMPI's configure uses:

- if --without-<foo> is specified, OMPI's configure doesn't look for
feature <foo> and just skips it
- if neither --with-<foo> nor --without-<foo> are specified, OMPI
looks for feature <foo>. If the feature is found, use it. If not,
skip it.
- if --with-<foo> is specified (with or without a directory), OMPI
looks for feature <foo>. If the feature is not found, abort configure
on the rationale that you specifically asked for a feature that
configure can't deliver, so abort and let a human figure it out.
- if --with-<foo> is specified (without a directory), OMPI should look
for the feature in the default compiler/linker paths
- if --with-<foo>=directory is specified, OMPI should look for the
feature in the specified compiler/linker paths, and abort if it can't
find those paths

The last part ("abort if it can't find those paths") was added in v1.3
because some users were specifying --with-<foo>=/some/nonexistent/path
and still having configure succeed by accidentally using some system-
default version of <foo> rather than a specific version of <foo> that
was installed in a non-default location. This caused no end of
confusion until they realized that they had a typo in the directory
name specified to --with-<foo>=<dir>. Then OMPI got blamed. :-) So
we added sanity checks to ensure that the directories that are
specified and that we look for that are derived from the specified
directories actually exist.

Does that help?

> A)Is the directory name mandatory or optional in the options above?
> I.e. is "--with-libnuma" only OK, or do I have to use
> "--with-libnuma=/some/directory"?
>

It should be optional.

> The results in 2) above suggest that configure finds the libraries and
> includes correctly, but that it writes wrong Makefiles,
> and doesn't report any error either.
>

There's likely a bug in our --with-libnuma handling that is taking the
default value from configure ("yes") and treating it as a directory
instead of just an indicator that you want libnuma support. I'll fix
it.

> B) Is the syntax in 3) above the only correct possibility?
>

You should be able to leave off all those --with options, but then
OMPI's configure will happily trundle through if it *doesn't* find
those 3 features. So option 3) is definitely safest because OMPI's
configure will abort if it doesn't find them (leaving you with an
unexpectedly feature-poor OMPI installation).

> C) If it is, can I rest assured that configure and make
> will find the right 64-bit libraries, not 32-bit libraries
> of similar name?
>

OMPI will only successfully link against the Right libraries for
whatever flavor you're building. If you have told your compiler to
build 64 bit versions of OMPI (or your compiler simply defaults to 64
bit), then the linker will only allow OMPI to link successfully
against the 64 bit libraries (in Linux; in other OS's, it may be
different -- such as OS X).

> I ask because I have /usr/lib/libnuma.so.1 (32-bit ELF),
> and /usr/lib64/libnuma.so.1 (64-bit ELF), and both are in the
> same /usr directory that I gave to configure (--with-libnuma=/usr).
> (Well, maybe this is deferred to the compiler to decide,
> whether it is a 64- or 32-bit compiler, as somehow it seemed to work.)
>

Yep, we try both <dir>/lib and <dir>/lib64 when directories are
specified. Also, both /usr/lib and /usr/lib64 are likely in your
linker's default search path. So even if you hadn't specified --with-
libnuma=/usr, then the default linker search path would have found /
usr/lib64/libnuma.so.1.

-- 
Jeff Squyres
Cisco Systems