Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-04-28 07:50:15


I'd be fascinated to understand how this works. There are multiple
function calls in MPI_Init, for example, that simply don't exist in
1.3.x. There are references to fields in structures that are no longer
present, though the structure itself does still exist. Etc.

I frankly am stunned that the whole thing doesn't abort due to
unresolved references. I tried it here, just for grins, and it aborted
immediately.

I wonder if you truly are running as you think, or if the application
isn't picking up the 1.3.x libraries without you realizing it?

Otherwise, I can only assume you have a magic system that can somehow
translate calls to "orte_gpr.xxx" into their equivalent
"orte_grpcomm.yyy", with appropriate changes in parameters....and can
figure out that it should just skip a call to "orte_init_stage2" since
that function no longer exists....etc. :-)

On Apr 28, 2009, at 5:39 AM, Serge wrote:

> Ralph, Brian, and Jeff,
>
> Thank you for your answers.
>
> I want to confirm Brian's words that I am "compiling the application
> against one version of Open MPI, linking dynamically, then running
> against another version of Open MPI".
>
> The fact that the ABI has stabilized with the release of version
> 1.3.2, and it's supposed to be steady throughout v1.3 and 1.4, is
> great news.
>
> What I will try to do is to recompile more applications with v1.3.x
> and then try to run them against v1.2.x. After all, it's just for a
> quick smooth transition, so that we do not take an outage to make a
> system-wide upgrade. It's worth trying.
>
> = Serge
>
>
> Ralph Castain wrote:
>> Remember also that the RTE API's changed between 1.2 and 1.3 - so
>> I'm not sure what will happen in that case. It could be that the
>> ones touching the MPI layer remained stable (don't honestly
>> recall), though I believe there are RTE calls in 1.3 that don't
>> exist in 1.2. I would think you would have a problem if you hit one
>> of those (e.g., when doing a comm_spawn).
>> On Mon, Apr 27, 2009 at 12:36 PM, Jeff Squyres <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]
>> >> wrote:
>> I'd actually be surprised if it works.
>> The back-end sizes of Open MPI structures definitely changed
>> between
>> 1.2 and 1.3. We used to think that this didn't matter, but then
>> we
>> found out that we were wrong. :-) Hence, I'd think that the same
>> exact issues you have with taking a 1.2-compiled MPI application
>> and
>> running with 1.3 would also occur if you took a 1.3-compiled
>> application and ran it with 1.2. If it works at all, I'm guessing
>> that you're getting lucky.
>> We only finally put in some ABI fixes in 1.3.2. So the ABI
>> *should*
>> be steady throughout the rest of the 1.3 and 1.4 series.
>> On Apr 27, 2009, at 2:30 PM, Brian W. Barrett wrote:
>> I think Serge is talking about compiling the application
>> against one
>> version of Open MPI, linking dynamically, then running against
>> another
>> version of Open MPI. Since it's dynamically linked, the
>> ORTE/OMPI
>> interactions are covered (the version of mpirun, libopen-rte,
>> and libmpi
>> all match). The question of application binary
>> compatibility can
>> generally be traced to a couple of issues:
>> - function signatures of all MPI functions
>> - constants in mpi.h changing
>> - size of structures due to the bss optimization for globals
>> I can't remember when we changed function signatures last, but
>> it probably
>> has happened. They may be minor enough to not matter, and
>> definitely
>> wouldn't be in the usual set of functions people use
>> (send,recv,wait,
>> etc.)
>> The constants in mpi.h have been pretty steady since day 1,
>> although I
>> haven't checked when they last changed.
>> The final one actually should be ok for going from later
>> versions of Open
>> MPI to newer versions, as the structures in question usually
>> grow and
>> rarely shrink in size.
>> In other words, it'll probably work, but no one in the group
>> is
>> going to
>> say anything stronger than that.
>> Brian
>> On Mon, 27 Apr 2009, Ralph Castain wrote:
>> > It's hard for me to believe that would work as there are
>> fundamental
>> > differences in the MPI-to-RTE interactions between those
>> releases. If it
>> > does, it could be a fluke - I personally would not trust
>> it.
>> >
>> > Ralph
>> >
>> > On Mon, Apr 27, 2009 at 12:04 PM, Serge <skhan_at_[hidden]
>> <mailto:skhan_at_[hidden]>> wrote:
>> > Hi Jeff,
>> >
>> > > That being said, we have fixed this issue and
>> expect to
>> > support binary
>> > > compatibility between Open MPI releases starting
>> with
>> > v1.3.2 (v1.3.1
>> >
>> > As far as I can tell from reading the release notes
>> for
>> > v1.3.2, the binary compatibility has not been
>> announced
>> yet.
>> > It was rather a bug fix release. Is it correct?
>> Does it
>> mean
>> > that the compatibility feature is pushed to later
>> releases,
>> > v1.3.3, 1.3.4?
>> >
>> > In my original message (see below) I was looking
>> for advice
>> > as for a seamless transition from v1.2.x to v1.3.x
>> in a
>> > shared multi-user environment.
>> >
>> > Interestingly enough, recently I noticed that
>> although it's
>> > impossible to run an application compiled with
>> v1.2.x under
>> > v1.3.x, the opposite does actually work. An
>> application
>> > compiled with v1.3.x runs using Open MPI v1.2.x.
>> > Specifically, I tested an application compiled with
>> v1.3.0
>> > and v1.3.2, running under Open MPI v1.2.7.
>> >
>> > This gives me a perfect opportunity to recompile
>> all the
>> > parallel applications with v1.3.x, transparently to
>> users;
>> > and then switch the default Open MPI library from
>> v1.2.7 to
>> > v1.3.x, when all the apps have been rebuilt.
>> >
>> > The problem is that I am not 100% sure in this
>> approach, even
>> > having some successful tests done.
>> >
>> > Is it safe to run an application built with 1.3.x
>> under
>> > 1.2.x? Does it make sense to you?
>> >
>> > = Serge
>> >
>> >
>> > Jeff Squyres wrote:
>> > Unfortunately, binary compatibility between
>> Open
>> > MPI release versions has never been guaranteed
>> > (even between subreleases).
>> >
>> > That being said, we have fixed this issue and
>> > expect to support binary compatibility between
>> > Open MPI releases starting with v1.3.2 (v1.3.1
>> > should be released soon; we're aiming for
>> v1.3.2
>> > towards the beginning of next month).
>> >
>> >
>> >
>> > On Mar 10, 2009, at 11:59 AM, Serge wrote:
>> >
>> > Hello,
>> >
>> > We have a number of applications
>> > built with Open MPI 1.2 in a shared
>> > multi-user environment. The Open MPI
>> > library upgrade has been always
>> > transparent and painless within the
>> > v1.2 branch. Now we would like to
>> > switch to Open MPI 1.3 as seamlessly.
>> > However, an application built with
>> > ompi v1.2 will not run with the 1.3
>> > library; the typical error messages
>> > are given below. Apparently, the type
>> > ompi_communicator_t has changed.
>> >
>> > Symbol `ompi_mpi_comm_null' has
>> > different size in shared object,
>> > consider re-linking
>> > Symbol `ompi_mpi_comm_world' has
>> > different size in shared object,
>> > consider re-linking
>> >
>> > Do I have to rebuild all the
>> > applications with Open MPI 1.3?
>> >
>> > Is there a better way to do a smooth
>> > upgrade?
>> >
>> > Thank you.
>> >
>> > = Serge
>> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users