Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-04-28 07:50:15


I'd be fascinated to understand how this works. There are multiple
function calls in MPI_Init, for example, that simply don't exist in
1.3.x. There are references to fields in structures that are no longer
present, though the structure itself does still exist. Etc.

I frankly am stunned that the whole thing doesn't abort due to
unresolved references. I tried it here, just for grins, and it aborted
immediately.

I wonder if you truly are running as you think, or if the application
isn't picking up the 1.3.x libraries without you realizing it?

Otherwise, I can only assume you have a magic system that can somehow
translate calls to "orte_gpr.xxx" into their equivalent
"orte_grpcomm.yyy", with appropriate changes in parameters....and can
figure out that it should just skip a call to "orte_init_stage2" since
that function no longer exists....etc. :-)

On Apr 28, 2009, at 5:39 AM, Serge wrote:

> Ralph, Brian, and Jeff,
>
> Thank you for your answers.
>
> I want to confirm Brian's words that I am "compiling the application
> against one version of Open MPI, linking dynamically, then running
> against another version of Open MPI".
>
> The fact that the ABI has stabilized with the release of version
> 1.3.2, and it's supposed to be steady throughout v1.3 and 1.4, is
> great news.
>
> What I will try to do is to recompile more applications with v1.3.x
> and then try to run them against v1.2.x. After all, it's just for a
> quick smooth transition, so that we do not take an outage to make a
> system-wide upgrade. It's worth trying.
>
> = Serge
>
>
> Ralph Castain wrote:
>> Remember also that the RTE API's changed between 1.2 and 1.3 - so
>> I'm not sure what will happen in that case. It could be that the
>> ones touching the MPI layer remained stable (don't honestly
>> recall), though I believe there are RTE calls in 1.3 that don't
>> exist in 1.2. I would think you would have a problem if you hit one
>> of those (e.g., when doing a comm_spawn).
>> On Mon, Apr 27, 2009 at 12:36 PM, Jeff Squyres <jsquyres_at_[hidden] <mailto:jsquyres_at_[hidden]
>> >> wrote:
>> I'd actually be surprised if it works.
>> The back-end sizes of Open MPI structures definitely changed
>> between
>> 1.2 and 1.3. We used to think that this didn't matter, but then
>> we
>> found out that we were wrong. :-) Hence, I'd think that the same
>> exact issues you have with taking a 1.2-compiled MPI application
>> and
>> running with 1.3 would also occur if you took a 1.3-compiled
>> application and ran it with 1.2. If it works at all, I'm guessing
>> that you're getting lucky.
>> We only finally put in some ABI fixes in 1.3.2. So the ABI
>> *should*
>> be steady throughout the rest of the 1.3 and 1.4 series.
>> On Apr 27, 2009, at 2:30 PM, Brian W. Barrett wrote:
>> I think Serge is talking about compiling the application
>> against one
>> version of Open MPI, linking dynamically, then running against
>> another
>> version of Open MPI. Since it's dynamically linked, the
>> ORTE/OMPI
>> interactions are covered (the version of mpirun, libopen-rte,
>> and libmpi
>> all match). The question of application binary
>> compatibility can
>> generally be traced to a couple of issues:
>> - function signatures of all MPI functions
>> - constants in mpi.h changing
>> - size of structures due to the bss optimization for globals
>> I can't remember when we changed function signatures last, but
>> it probably
>> has happened. They may be minor enough to not matter, and
>> definitely
>> wouldn't be in the usual set of functions people use
>> (send,recv,wait,
>> etc.)
>> The constants in mpi.h have been pretty steady since day 1,
>> although I
>> haven't checked when they last changed.
>> The final one actually should be ok for going from later
>> versions of Open
>> MPI to newer versions, as the structures in question usually
>> grow and
>> rarely shrink in size.
>> In other words, it'll probably work, but no one in the group
>> is
>> going to
>> say anything stronger than that.
>> Brian
>> On Mon, 27 Apr 2009, Ralph Castain wrote:
>> > It's hard for me to believe that would work as there are
>> fundamental
>> > differences in the MPI-to-RTE interactions between those
>> releases. If it
>> > does, it could be a fluke - I personally would not trust
>> it.
>> >
>> > Ralph
>> >
>> > On Mon, Apr 27, 2009 at 12:04 PM, Serge <skhan_at_[hidden]
>> <mailto:skhan_at_[hidden]>> wrote:
>> > Hi Jeff,
>> >
>> > > That being said, we have fixed this issue and
>> expect to
>> > support binary
>> > > compatibility between Open MPI releases starting
>> with
>> > v1.3.2 (v1.3.1
>> >
>> > As far as I can tell from reading the release notes
>> for
>> > v1.3.2, the binary compatibility has not been
>> announced
>> yet.
>> > It was rather a bug fix release. Is it correct?
>> Does it
>> mean
>> > that the compatibility feature is pushed to later
>> releases,
>> > v1.3.3, 1.3.4?
>> >
>> > In my original message (see below) I was looking
>> for advice
>> > as for a seamless transition from v1.2.x to v1.3.x
>> in a
>> > shared multi-user environment.
>> >
>> > Interestingly enough, recently I noticed that
>> although it's
>> > impossible to run an application compiled with
>> v1.2.x under
>> > v1.3.x, the opposite does actually work. An
>> application
>> > compiled with v1.3.x runs using Open MPI v1.2.x.
>> > Specifically, I tested an application compiled with
>> v1.3.0
>> > and v1.3.2, running under Open MPI v1.2.7.
>> >
>> > This gives me a perfect opportunity to recompile
>> all the
>> > parallel applications with v1.3.x, transparently to
>> users;
>> > and then switch the default Open MPI library from
>> v1.2.7 to
>> > v1.3.x, when all the apps have been rebuilt.
>> >
>> > The problem is that I am not 100% sure in this
>> approach, even
>> > having some successful tests done.
>> >
>> > Is it safe to run an application built with 1.3.x
>> under
>> > 1.2.x? Does it make sense to you?
>> >
>> > = Serge
>> >
>> >
>> > Jeff Squyres wrote:
>> > Unfortunately, binary compatibility between
>> Open
>> > MPI release versions has never been guaranteed
>> > (even between subreleases).
>> >
>> > That being said, we have fixed this issue and
>> > expect to support binary compatibility between
>> > Open MPI releases starting with v1.3.2 (v1.3.1
>> > should be released soon; we're aiming for
>> v1.3.2
>> > towards the beginning of next month).
>> >
>> >
>> >
>> > On Mar 10, 2009, at 11:59 AM, Serge wrote:
>> >
>> > Hello,
>> >
>> > We have a number of applications
>> > built with Open MPI 1.2 in a shared
>> > multi-user environment. The Open MPI
>> > library upgrade has been always
>> > transparent and painless within the
>> > v1.2 branch. Now we would like to
>> > switch to Open MPI 1.3 as seamlessly.
>> > However, an application built with
>> > ompi v1.2 will not run with the 1.3
>> > library; the typical error messages
>> > are given below. Apparently, the type
>> > ompi_communicator_t has changed.
>> >
>> > Symbol `ompi_mpi_comm_null' has
>> > different size in shared object,
>> > consider re-linking
>> > Symbol `ompi_mpi_comm_world' has
>> > different size in shared object,
>> > consider re-linking
>> >
>> > Do I have to rebuild all the
>> > applications with Open MPI 1.3?
>> >
>> > Is there a better way to do a smooth
>> > upgrade?
>> >
>> > Thank you.
>> >
>> > = Serge
>> >
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users