Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems configuring OpenMPI 1.3.1 with numa, torque, and openib
From: Gus Correa (gus_at_[hidden])
Date: 2009-04-10 18:41:44

Hi Jeff

Thank you very much for the thorough explanation.
The OpenMPI configure script rationale and design,
as you described them, are wise and clear.
They avoid tricking the user or making decisions
that he/she may not want, but make the right decisions
when the user defers them to OpenMPI.

I would suggest that you cut and paste the part of your message
where you explain the way the OpenMPI configure script works,
and make it either a FAQ or part of the README file,
for better visibility, if this material is not yet there.
Your explanation is very helpful indeed,
and should benefit other users besides me.

And yes/lib :), I could build 1.3.1 right,
with numa, torque, and openib!

Many thanks,
Gus Correa
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA

Jeff Squyres wrote:
> On Apr 9, 2009, at 6:16 PM, Gus Correa wrote:
>> The configure scripts seem to have changed, and work different
>> than before, particularly w.r.t. additional libraries like numa,
>> torque, and openib.
>> The new behavior can be a bit unexpected and puzzled me,
>> although eventually I could build 1.3.1.
> Yes, we put in this new functionality due to user requests. See below.
>> Here are my observations.
>> 1) I used to configure OpenMPI 1.2.8 and 1.3.0 with:
>> --with-libnuma=/usr/lib64 \
>> --with-tm=/usr/lib64 \
>> --with-openib=/usr/lib64 \
>> This worked fine for me on the same computer I am using for 1.3.1.
>> However, with 1.3.1 the same options fail.
>> Configure now tries to find the corresponding include
>> files on /usr/lib64/include, a directory that doesn't even exist.
>> The include files are actually in /usr/include
>> (as the old configure knew well).
> What happened in the 1.2.x configure was that OMPI was adding
> -L/usr/lib64/lib and trying to find the relevant libraries. But since
> /usr/lib64 was already in your linker's default search path, the
> relevant libraries were found without any addition flags from OMPI.
> Additionally, OMPI was also adding -I/usr/lib64/include to the compile
> path, but the relevant header files were found because they were in your
> compiler's default search path (likely /usr/include). So both the added
> -I and -L flags were meaningless -- albeit harmless.
>> 2) Therefore, I tried to configure with:
>> --with-libnuma \
>> --with-tm \
>> --with-openib \
>> Note that no directory is being pointed to.
>> My hope was that configure would find the libraries and includes in
>> standard places (and hopefully the correct libs, 64-bit, not 32-bit).
> This *should* be fine.
>> This way configure completes with no apparent error.
>> However, I get this funny error on the make phase:
>> /bin/sh ../../../../libtool --tag=CC --mode=link gcc -DNDEBUG
>> -march=amdfam10 -O3 -finline-functions -funroll-loops -mfpmath=sse
>> -fno-strict-aliasing -pthread -fvisibility=hidden -module -avoid-version
>> -Lyes/lib -export-dynamic -o
>> maffinity_libnuma_component.lo maffinity_libnuma_module.lo -lnuma -lnsl
>> -lutil -lm
>> ../../../../libtool: line 4998: cd: yes/lib: No such file or directory
>> libtool: link: cannot determine absolute directory name of `yes/lib'
>> make[2]: *** [] Error 1
> Huh. That's odd.
>> Note the "yes/lib" path.
>> A little grep on config.log showed why the error:
>> %grep yes config.log
>> ...
>> maffinity_libnuma_CPPFLAGS=' -Iyes/include'
>> maffinity_libnuma_LDFLAGS=' -Lyes/lib'
>> #define WRAPPER_EXTRA_LDFLAGS " -Lyes/lib "
>> Is this an internal "yes" answer to configure that
>> is being inadvertently caught/interpreted as a directory name?
> Ah, crud. Probably so, yes.
> (/me double checks libnuma's m4 setup... crud; I can replicate the
> problem. I'll try to commit a fix this afternoon so that it can be
> included in 1.3.2)
>> Since configure seems to have found the libraries and include files,
>> and completed without error,
>> shouldn't it also have reported the correct paths to config.log
>> and written them correctly to the Makefiles?
>> 3) Finally I tried this:
>> --with-libnuma=/usr \
>> --with-tm=/usr \
>> --with-openib=/usr \
>> This approach was suggested by Prentice Bisbal a few days ago,
>> when Francesco Pietra reported on this list
>> having a similar problem with libnuma.
>> This seems to work fine, and OpenMPI 1.3.1 builds.
> Good. FWIW, you probably don't need to specify any of these. More below.
> Generally, unless you specify --without-<foo>, OMPI will look for
> feature <foo> in the default paths. If the feature is found, then OMPI
> uses it. If the feature is not found, OMPI just skips it. Specifying
> --with-<foo> is supposed to indicate to OMPI's configure "yes, I
> definitely want this feature" (regardless of whether you specified a
> directory or not), meaning that if OMPI can't find that feature,
> configure will abort on the rationale that you specifically asked for
> something but we can't deliver it. So abort and let a human figure it out.
>> However, I have more questions:
> Here's the general scheme that OMPI's configure uses:
> - if --without-<foo> is specified, OMPI's configure doesn't look for
> feature <foo> and just skips it
> - if neither --with-<foo> nor --without-<foo> are specified, OMPI looks
> for feature <foo>. If the feature is found, use it. If not, skip it.
> - if --with-<foo> is specified (with or without a directory), OMPI looks
> for feature <foo>. If the feature is not found, abort configure on the
> rationale that you specifically asked for a feature that configure can't
> deliver, so abort and let a human figure it out.
> - if --with-<foo> is specified (without a directory), OMPI should look
> for the feature in the default compiler/linker paths
> - if --with-<foo>=directory is specified, OMPI should look for the
> feature in the specified compiler/linker paths, and abort if it can't
> find those paths
> The last part ("abort if it can't find those paths") was added in v1.3
> because some users were specifying --with-<foo>=/some/nonexistent/path
> and still having configure succeed by accidentally using some
> system-default version of <foo> rather than a specific version of <foo>
> that was installed in a non-default location. This caused no end of
> confusion until they realized that they had a typo in the directory name
> specified to --with-<foo>=<dir>. Then OMPI got blamed. :-) So we
> added sanity checks to ensure that the directories that are specified
> and that we look for that are derived from the specified directories
> actually exist.
> Does that help?
>> A)Is the directory name mandatory or optional in the options above?
>> I.e. is "--with-libnuma" only OK, or do I have to use
>> "--with-libnuma=/some/directory"?
> It should be optional.
>> The results in 2) above suggest that configure finds the libraries and
>> includes correctly, but that it writes wrong Makefiles,
>> and doesn't report any error either.
> There's likely a bug in our --with-libnuma handling that is taking the
> default value from configure ("yes") and treating it as a directory
> instead of just an indicator that you want libnuma support. I'll fix it.
>> B) Is the syntax in 3) above the only correct possibility?
> You should be able to leave off all those --with options, but then
> OMPI's configure will happily trundle through if it *doesn't* find those
> 3 features. So option 3) is definitely safest because OMPI's configure
> will abort if it doesn't find them (leaving you with an unexpectedly
> feature-poor OMPI installation).
>> C) If it is, can I rest assured that configure and make
>> will find the right 64-bit libraries, not 32-bit libraries
>> of similar name?
> OMPI will only successfully link against the Right libraries for
> whatever flavor you're building. If you have told your compiler to
> build 64 bit versions of OMPI (or your compiler simply defaults to 64
> bit), then the linker will only allow OMPI to link successfully against
> the 64 bit libraries (in Linux; in other OS's, it may be different --
> such as OS X).
>> I ask because I have /usr/lib/ (32-bit ELF),
>> and /usr/lib64/ (64-bit ELF), and both are in the
>> same /usr directory that I gave to configure (--with-libnuma=/usr).
>> (Well, maybe this is deferred to the compiler to decide,
>> whether it is a 64- or 32-bit compiler, as somehow it seemed to work.)
> Yep, we try both <dir>/lib and <dir>/lib64 when directories are
> specified. Also, both /usr/lib and /usr/lib64 are likely in your
> linker's default search path. So even if you hadn't specified
> --with-libnuma=/usr, then the default linker search path would have
> found /usr/lib64/