Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [devel-core] OMPI MCA components - track external libs versions
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2014-04-14 15:05:20


+1. This is very helpful info to have.

Best,
Pavel (Pasha) Shamis

On Apr 14, 2014, at 2:57 PM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

sure, lets discuss it on the next telecon in 1w (Mellanox IL is OOO for holidays and Josh is on vacation).

I think it is very good feature from enhancing OMPI usability point of view.

See it as a programmable version of release notes, i.e.

example:

- In release notes vendors often specify that OpenMPI-SHMEM with PMI2 requires mxm 2.1, slurm 2.6.2+, libibverbs 2.2+, etc.
- The user/site/sysadmin can compile OpenMPI-SHMEM package with libibverbs 2.1, mxm 1.5 and slurm 2.6.1 which is perfectly valid and will work w/o any issues, but not certified by vendor because of some known issues with this mix.

- vendor can provide script (or site admin can write one based on site local certification) to check with help of ompi_info,oshmem_info the current setup version which was compiled with OMPI and get a warning and save hassle of running into well-known issues.

I think (+know) that many production environments and OMPI users will be happy to have it.

On Mon, Apr 14, 2014 at 6:07 PM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:
Perhaps this is something best discussed on the weekly telecon? I think you are misunderstanding what I'm saying. I'm not heavily against it, but I still don't see the value, and dislike making disruptive changes that span the code base without first ensuring there is no other viable alternative.

FWIW: Most libraries remain ABI compliant across major releases for exactly the reasons you cite. We don't actually support building against one library version and running against another for these very reasons - if users do that, it is at their own risk. Your change won't resolve that problem as ompi_info is just as likely to barf when confronted by that situation - remember, in order to register the component, ompi_info has to *load* it first. So any library incompatibility may well have already caused a problem.

On Apr 14, 2014, at 7:59 AM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

There is no correlation between built_with and running_with versions of external libraries supported by OMPI.

The next release of external library does not mean we should remove code in ompi for all previous supported releases for the same library.

vendor/site can certify slurm version 2.6.1 while latest is 2.6.6.
SLURM is not ABI compliant between releases, so site would like to know what is active version vs. certified to issue an early warning.

Why are you so against it? I don`t see any issue with printing ext lib version number in the MCA description, something that can improve sysadmin/user-experience.

On Mon, Apr 14, 2014 at 5:47 PM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:

On Apr 14, 2014, at 7:34 AM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

it is unrelated:

1. The OMPI can support and built with many different (or all) versions of external library (for example: libmxm or libslurm).

Not true - we do indeed check the library version in all cases where it matters. For example, the case you cite as your true story could easily have been prevented by using OMPI_CHECK_PACKAGE to verify that the libmxm had the required function in it

2. The OMPI utility ompi_info can expose the currently available version of libmxm/libslurm.

Yes - but what good does that do? Bottom line is that you shouldn't have built if that library version isn't supported

3. The vendor or end-user wants to certify specific version of libmxm or libslurm to be used in the customer environment.
4. The current way - put a note into libmxm/libslurm Relase Notes, which is not a guarantee that site user/sysadmin will pay attention in production environment.

Again, that's the whole purpose of the configure logic. You are supposed to check the library to ensure it is compatible, not just blindly build and then make the user figure it out

5. The suggestion is to use #2 to write script by user or vendor which will match currently available versions with supported/certified and let admin/user know that there is a mismatch between running and supported version.

Like I said, that's the developer's responsibility to get the configure logic correct - not the user's responsibility to figure it out after-the-fact.

P.S. based on the true story :)

On Mon, Apr 14, 2014 at 5:19 PM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:
<let's be consistent and shift this to the devel list>

I'm still confused - how is that helpful? How was the build allowed to complete if the external library version isn't supported?? You should either quietly not-build the affected component, or error out if the user specifically requested that component be built.

This sounds to me like you have a weakness in your configure logic, and are trying to find a bandaid. Perhaps a better solution (that wouldn't cause us to change every component in the code base) would be to just add appropriate tests to your configure logic so you don't incorrectly build against an unsupported library?

On Apr 14, 2014, at 7:12 AM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

The use-case I`m interested to expose through ompi_info/oshmem_info the compiled-in versions of external libraries.
User/Vendor can write small script on top of ompi_info/oshmem_info to check if running version are in par with supported matrix.

On Mon, Apr 14, 2014 at 5:06 PM, Ralph Castain <rhc_at_[hidden]<mailto:rhc_at_[hidden]>> wrote:
Guess I'm a little confused and trying to understand the issue, so let's consider a couple of cases:

* If we are building against an unsupported version of an external library, that is supposed to be detected by the configure logic, yes? So you would output a nice error message at that time, and stop the build process.

* If we were built against one version of an external library, and someone attempts to run us against a different version, you'd have to detect that somehow at runtime. I'm not sure how you could reliably do that as the problem is likely to manifest itself as an unresolved function (i.e., we use something that doesn't exist in the version being used) or a difference in a function signature. Either way, you'll either fail to load the library or crash.

So I'm not sure what the added function pointer actually accomplishes. I suppose you could use it during ompi_info to display something about what version you linked against, but that won't help solve either of the above problems.

Could you help explain a little further?

Thanks
Ralph

On Apr 14, 2014, at 5:57 AM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

+devel mailing list (for web mail archive)

On Sat, Apr 12, 2014 at 9:04 PM, Mike Dubman <miked_at_[hidden]<mailto:miked_at_[hidden]>> wrote:

Hi,

Could you please suggest if following is addressed in MCA architecture or maybe it is something we should add:

Current MCA API:
The new MCA component should provide descriptor mca_base_component_2_0_0_t which specifies how to init/open/close/query/enable every new component.
Also, the descriptor is used to specify version of MCA framework and version of MCA component.

What is missing:
Some MCA components are wrappers on top of external libraries, i.e.

hwloc (libhwloc.so)
usnic (libusnic.so)
fca (libfca.so)
mxm (libmxm.so)
slurm (libslurn.so)
pbs
pmi
openib (libibverbs)
vader (knem, ...)
...

So, it would be very useful if MCA descriptor will have another function pointer which return the version of external dependent library (if applicable).
The ompi_info and oshmem_info will print it if present and will allow sysadmin to track vendor specific dependencies for OMPI (like: mxm compiled with libmxm 2.1, usnic with libusnic v1.0, ...) and warn users if compiled versions do not match vendor recommended.

Please suggest.

Thanks

_______________________________________________
devel-core mailing list
devel-core_at_[hidden]<mailto:devel-core_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core

_______________________________________________
devel-core mailing list
devel-core_at_[hidden]<mailto:devel-core_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core

_______________________________________________
devel-core mailing list
devel-core_at_[hidden]<mailto:devel-core_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel-core

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14507.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14508.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14509.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14510.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14511.php

_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/04/14515.php