Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] autoconf warnings: openib BTL
From: Kenneth A. Lloyd (kenneth.lloyd_at_[hidden])
Date: 2014-03-24 10:42:39


Vasily,

The problem you've identified of differing kernel versions is exacerbated by
also computing across hybrid, heterogeneous hardware architectures (i.e.
SMP& NUMA, different streaming processor architectures, or different shared
memory architectures).

==========================
Kenneth A. Lloyd, Jr.
CEO - Director, Systems Science
Watt Systems Technologies Inc.
Albuquerque, NM USA
www.wattsys.com
kenneth.lloyd_at_[hidden]

This e-mail is covered by the Electronic Communications Privacy Act, 18
U.S.C. 2510-2521, and is intended only for the addressee named above. It may
contain privileged or confidential information. If you are not the addressee
you must not copy, distribute, disclose or use any of the information in
this transmission. If you received it in error, please delete it and
immediately notify the sender.

-----Original Message-----
From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Vasily Filipov
Sent: Monday, March 24, 2014 7:44 AM
To: Open MPI Developers
Subject: Re: [OMPI devel] autoconf warnings: openib BTL

Actually I think if you build your job with one kernel version and run it on
nodes that have another version so rdmacm will be the smallest your problem.
Anyway, here is the revision fixes the issue.

------------------------------------------------------------------------
r31194 | vasily | 2014-03-24 15:36:04 +0200 (Mon, 24 Mar 2014) | 3 lines

BTL/OPENIB: remove AC_RUN_IFELSE from configure and check AF_IB support by
lib rdmacm during component_init.

------------------------------------------------------------------------

Thank you,
Vasily.

On 13-Mar-14 15:44, Ralph Castain wrote:
> I think the critical point is this one:
>
>> To be clear: whether AF_IB works or not is a determination to make on the
machines on which you *run* -- NOT on the machine on which you *build*.
> Many of our users compile on the frontend node of their cluster, which
doesn't even have an IB NIC installed (they only have the libraries present
so it can compile). You need to test this at run time to ensure you are on a
machine where someone actually is able to run rdmacm.
>
>
> On Mar 13, 2014, at 5:53 AM, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
wrote:
>
>> On Mar 13, 2014, at 4:59 AM, Mike Dubman <miked_at_[hidden]>
wrote:
>>
>>>>>> Right? If so, I don't see why you need the AC_TRY_RUN -- if RDMACM
is easily detectable as to which way it is compiled (because it has, for
example, different fields), then AC_CHECK_DECLS should be enough, right?
>>> RDMACM API has different implementation requirements for its providers:
tcp, af_ib (different structs/fields should be used/passed. different
APIs/hooks should be called for bring-up).
>> Yes, this was said before. Which is why I don't understand why
AC_CHECK_DECLS isn't enough -- it's a compile-time check, right?
>>
>> Let me get this straight:
>>
>> 1. AF_IB may or may not be present.
>> 2. If AF_IB is present, it may or may not work (i.e., support for AF_IB
may or may not work in the kernel).
>> 3. If AF_IB is present, you can only compile with the AF_IB fields and
methods.
>> 4. If AF_IB is not present, you can only compile with the non-AF_IB
fields and methods.
>>
>> I think #2 is not relevant for configure -- only #1, #3, and #4 are
relevant. So you should have code something like this:
>>
>> #if HAVE_DECL_AF_IB
>> ret = do_the_stuff_with_af_ib(...);
>> if (OMPI_SUCCESS != ret) {
>> opal_show_help(...AF_IB doesn't work...);
>> return ret;
>> }
>> #else
>> ret = do_the_stuff_without_af_ib(...);
>> if (OMPI_SUCCESS != ret) {
>> opal_show_help(...non-AF_IB doesn't work...);
>> return ret;
>> }
>> #endif
>>
>> To be clear: whether AF_IB works or not is a determination to make on the
machines on which you *run* -- NOT on the machine on which you *build*.
>>
>> This is one of the key reasons that OMPI prefers run-time detection for
run-time characteristics over configure-time detection for run-time
characteristics (because you may run OMPI on different machines than where
you built OMPI).
>>
>>> Currently, the RDMACM provider can be selected at compile time only and
mpirun becomes incompatible to other RDMACM providers.
>> What does mpirun have to do with this? We're talking about the openib
BTL, right?
>>
>>> AC_TRY_RUN is a protection that selected provider will be able to
run,otherwise no fallback to other provider will be available for user at
runtime.
>> I can't parse this statement...?
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14342.php
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14343.php
>

_______________________________________________
devel mailing list
devel_at_[hidden]
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14381.php

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4336 / Virus Database: 3722/7238 - Release Date: 03/23/14