Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [devel-core] OMPI MCA components - track external libs versions
From: Mike Dubman (miked_at_[hidden])
Date: 2014-04-14 14:57:49


sure, lets discuss it on the next telecon in 1w (Mellanox IL is OOO for
holidays and Josh is on vacation).

I think it is very good feature from enhancing OMPI usability point of view.

See it as a programmable version of release notes, i.e.

example:

- In release notes vendors often specify that OpenMPI-SHMEM with PMI2
requires mxm 2.1, slurm 2.6.2+, libibverbs 2.2+, etc.
- The user/site/sysadmin can compile OpenMPI-SHMEM package with libibverbs
2.1, mxm 1.5 and slurm 2.6.1 which is perfectly valid and will work w/o any
issues, but not certified by vendor because of some known issues with this
mix.

- vendor can provide script (or site admin can write one based on site
local certification) to check with help of ompi_info,oshmem_info the
current setup version which was compiled with OMPI and get a warning and
save hassle of running into well-known issues.

I think (+know) that many production environments and OMPI users will be
happy to have it.

On Mon, Apr 14, 2014 at 6:07 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Perhaps this is something best discussed on the weekly telecon? I think
> you are misunderstanding what I'm saying. I'm not heavily against it, but I
> still don't see the value, and dislike making disruptive changes that span
> the code base without first ensuring there is no other viable alternative.
>
> FWIW: Most libraries remain ABI compliant across major releases for
> exactly the reasons you cite. We don't actually support building against
> one library version and running against another for these very reasons - if
> users do that, it is at their own risk. Your change won't resolve that
> problem as ompi_info is just as likely to barf when confronted by that
> situation - remember, in order to register the component, ompi_info has to
> *load* it first. So any library incompatibility may well have already
> caused a problem.
>
>
> On Apr 14, 2014, at 7:59 AM, Mike Dubman <miked_at_[hidden]> wrote:
>
> There is no correlation between built_with and running_with versions of
> external libraries supported by OMPI.
>
> The next release of external library does not mean we should remove code
> in ompi for all previous supported releases for the same library.
>
> vendor/site can certify slurm version 2.6.1 while latest is 2.6.6.
> SLURM is not ABI compliant between releases, so site would like to know
> what is active version vs. certified to issue an early warning.
>
> Why are you so against it? I don`t see any issue with printing ext lib
> version number in the MCA description, something that can improve
> sysadmin/user-experience.
>
>
>
>
> On Mon, Apr 14, 2014 at 5:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>>
>> On Apr 14, 2014, at 7:34 AM, Mike Dubman <miked_at_[hidden]>
>> wrote:
>>
>> it is unrelated:
>>
>> 1. The OMPI can support and built with many different (or all) versions
>> of external library (for example: libmxm or libslurm).
>>
>>
>> Not true - we do indeed check the library version in all cases where it
>> matters. For example, the case you cite as your true story could easily
>> have been prevented by using OMPI_CHECK_PACKAGE to verify that the libmxm
>> had the required function in it
>>
>> 2. The OMPI utility ompi_info can expose the currently available version
>> of libmxm/libslurm.
>>
>>
>> Yes - but what good does that do? Bottom line is that you shouldn't have
>> built if that library version isn't supported
>>
>>
>> 3. The vendor or end-user wants to certify specific version of libmxm or
>> libslurm to be used in the customer environment.
>>
>> 4. The current way - put a note into libmxm/libslurm Relase Notes, which
>> is not a guarantee that site user/sysadmin will pay attention in production
>> environment.
>>
>>
>> Again, that's the whole purpose of the configure logic. You are supposed
>> to check the library to ensure it is compatible, not just blindly build and
>> then make the user figure it out
>>
>> 5. The suggestion is to use #2 to write script by user or vendor which
>> will match currently available versions with supported/certified and let
>> admin/user know that there is a mismatch between running and supported
>> version.
>>
>>
>> Like I said, that's the developer's responsibility to get the configure
>> logic correct - not the user's responsibility to figure it out
>> after-the-fact.
>>
>>
>> P.S. based on the true story :)
>>
>>
>>
>> On Mon, Apr 14, 2014 at 5:19 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> <let's be consistent and shift this to the devel list>
>>>
>>> I'm still confused - how is that helpful? How was the build allowed to
>>> complete if the external library version isn't supported?? You should
>>> either quietly not-build the affected component, or error out if the user
>>> specifically requested that component be built.
>>>
>>> This sounds to me like you have a weakness in your configure logic, and
>>> are trying to find a bandaid. Perhaps a better solution (that wouldn't
>>> cause us to change every component in the code base) would be to just add
>>> appropriate tests to your configure logic so you don't incorrectly build
>>> against an unsupported library?
>>>
>>>
>>> On Apr 14, 2014, at 7:12 AM, Mike Dubman <miked_at_[hidden]>
>>> wrote:
>>>
>>> The use-case I`m interested to expose through ompi_info/oshmem_info the
>>> compiled-in versions of external libraries.
>>> User/Vendor can write small script on top of ompi_info/oshmem_info to
>>> check if running version are in par with supported matrix.
>>>
>>>
>>> On Mon, Apr 14, 2014 at 5:06 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> Guess I'm a little confused and trying to understand the issue, so
>>>> let's consider a couple of cases:
>>>>
>>>> * If we are building against an unsupported version of an external
>>>> library, that is supposed to be detected by the configure logic, yes? So
>>>> you would output a nice error message at that time, and stop the build
>>>> process.
>>>>
>>>> * If we were built against one version of an external library, and
>>>> someone attempts to run us against a different version, you'd have to
>>>> detect that somehow at runtime. I'm not sure how you could reliably do that
>>>> as the problem is likely to manifest itself as an unresolved function
>>>> (i.e., we use something that doesn't exist in the version being used) or a
>>>> difference in a function signature. Either way, you'll either fail to load
>>>> the library or crash.
>>>>
>>>> So I'm not sure what the added function pointer actually accomplishes.
>>>> I suppose you could use it during ompi_info to display something about what
>>>> version you linked against, but that won't help solve either of the above
>>>> problems.
>>>>
>>>> Could you help explain a little further?
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>>
>>>> On Apr 14, 2014, at 5:57 AM, Mike Dubman <miked_at_[hidden]>
>>>> wrote:
>>>>
>>>> +devel mailing list (for web mail archive)
>>>>
>>>>
>>>> On Sat, Apr 12, 2014 at 9:04 PM, Mike Dubman <miked_at_[hidden]>wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> Could you please suggest if following is addressed in MCA architecture
>>>>> or maybe it is something we should add:
>>>>>
>>>>> Current MCA API:
>>>>> The new MCA component should provide descriptor
>>>>> mca_base_component_2_0_0_t which specifies how to
>>>>> init/open/close/query/enable every new component.
>>>>> Also, the descriptor is used to specify version of MCA framework and
>>>>> version of MCA component.
>>>>>
>>>>> What is missing:
>>>>> Some MCA components are wrappers on top of external libraries, i.e.
>>>>>
>>>>> hwloc (libhwloc.so)
>>>>> usnic (libusnic.so)
>>>>> fca (libfca.so)
>>>>> mxm (libmxm.so)
>>>>> slurm (libslurn.so)
>>>>> pbs
>>>>> pmi
>>>>> openib (libibverbs)
>>>>> vader (knem, ...)
>>>>> ...
>>>>>
>>>>> So, it would be very useful if MCA descriptor will have another
>>>>> function pointer which return the version of external dependent library (if
>>>>> applicable).
>>>>> The ompi_info and oshmem_info will print it if present and will allow
>>>>> sysadmin to track vendor specific dependencies for OMPI (like: mxm compiled
>>>>> with libmxm 2.1, usnic with libusnic v1.0, ...) and warn users if compiled
>>>>> versions do not match vendor recommended.
>>>>>
>>>>> Please suggest.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> devel-core mailing list
>>>> devel-core_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel-core mailing list
>>>> devel-core_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
>>>>
>>>
>>> _______________________________________________
>>> devel-core mailing list
>>> devel-core_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel-core
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/devel/2014/04/14507.php
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/04/14508.php
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/04/14509.php
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/04/14510.php
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2014/04/14511.php
>