Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] dropping a pls module into an Open MPI build
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-01-28 14:47:49


One thing you might check if you suspect compiler alignment issues is
running "ompi_info --all" and see what Apple used to configure/build
OMPI. We save the CFLAGS and whatnot; they may be helpful to you...?

I see on my MBP/Leopard 10.5.1, for example:

      C compiler absolute: /usr/bin/gcc
...
             Build CFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions -
fno-strict-aliasing
           Build CXXFLAGS: -O3 -DNDEBUG -arch i386 -finline-functions
             Build FFLAGS:
            Build FCFLAGS:
            Build LDFLAGS: -export-dynamic -Wl,-u,_munmap -Wl,-
multiply_defined,suppress
               Build LIBS: -lutil
     Wrapper extra CFLAGS:
   Wrapper extra CXXFLAGS:
     Wrapper extra FFLAGS:
    Wrapper extra FCFLAGS:
    Wrapper extra LDFLAGS: -Wl,-u,_munmap -Wl,-
multiply_defined,suppress
       Wrapper extra LIBS: -lutil

I'll *guess* that the -Wl options came from OMPI's normal configure
script. But the -arch and -f might have come from Apple...?

That being said, I'm *not* sure how this information relates to the
universal binaries... It *may* be that you'll see the different
options for the different architectures depending on which machine you
run "ompi_info" on...? I don't know enough about how universal
binaries are built or run to know.

On Jan 24, 2008, at 1:12 PM, Ralph H Castain wrote:

> Appreciate the clarification. I am unaware of anyone attempting that
> procedure in the past, but I'm not terribly surprised to hear it would
> encounter problems and/or fail. Given the myriad of configuration
> options in
> the code base, it would seem almost miraculous that you could either
> (a) hit
> the same config options used by Apple (whatever they were), or (b)
> manage to
> find a combination that matched enough to let you do this without
> problem.
>
> Frankly, I'm surprised even this small a fix would let you work
> around the
> problems... ;-)
>
> Unless you have some overriding reason to use the shipped binaries for
> everything other than this special component, you're probably going
> to have
> a lot more success just rebuilding from source.
>
> But that's just an opinion - either way, good luck with your efforts!
> Ralph
>
>
> On 1/24/08 10:54 AM, "Dean Dauger, Ph. D." <d_at_[hidden]>
> wrote:
>
>>> I'm sorry, but now I am totally confused. Are you saying that you
>>> are having
>>> problems with the default rsh component in the distributed 1.2.3
>>> code??
>>
>> Yes ...
>>
>>> Or are you having a problem with your customized version?
>>
>> and yes. Each exhibited the same problem, a bus error.
>>
>>> What compiler are you using? If it's your customized version, did
>>> you make sure to change the
>>> names of the data structures and modules as I pointed out?
>>
>> gcc 4.0.1, the default of Leopard. Yes, in the customized version, I
>> did change the names of the data structures, subroutines, support
>> file names, and where it says "rsh" just like you said.
>>
>>> We regularly work on Macs, both PPC and Intel based (I develop and
>>> test on
>>> both every day), and I have -never- seen this problem in our code
>>> base.
>>> Hence my confusion.
>>
>> I'm sorry to confuse. I'm starting with the shipping Mac OS X 10.5.1
>> "Leopard", which contains its own build of Open MPI (v1.2.3 according
>> to "orterun -version"). So I assumed that the v1.2.3 branch from
>> svn.open-mpi.org was the same code Apple used to build the Open MPI
>> that ships in Leopard.
>>
>> My motivation was to build a new pls module based on pls_rsh module's
>> source code, substituting the rsh with my own name like you said, but
>> I encountered a bus error. So to be sure I didn't screw up somewhere
>> in my custom module I rebuilt the unmodified pls_rsh module and
>> discovered the same problem.
>>
>> Then, after downloading the Open MPI from opensource.apple.com
>> (suspecting it was different), I tried recompiling the pls_rsh module
>> from that source code, dropped in just the resulting mca_pls_rsh.la
>> and mca_pls_rsh.so into the existing /usr/lib/openmpi of Leopard,
>> overwriting Leopard's versions, and the bus error happened the same
>> as before.
>>
>> That's where I was with my first post to this list.
>>
>> My last post regards the discovery that rearranging the elements of
>> orte_pls_rsh_component_t, without changing anything else about the
>> pls_rsh code, affects the bus error outcome. Then I padded out
>> orte_pls_rsh_component_t and my "orte_pls_dean_component_t" by hand
>> so that it would be "data alignment agnostic", if you will.
>> Consequently the bus error no longer occurs and both pls modules now
>> run as they should.
>>
>> My hypothesis: Apple's procedure to build Open MPI into Leopard had a
>> side effect requiring shared object code structures to follow a data
>> alignment different than if I simply recompile Open MPI straight from
>> its source.
>>
>> I'm not saying anyone is to blame, but I'm recognizing that those
>> builds have different timelines. I predict that if I overwrite all
>> of Leopard's Open MPI object code, then it would all run too.
>>
>> For my needs, I have a sufficient workaround: realign my data
>> structures to be "agnostic". I'm sharing this little discovery just
>> in case it might help somebody else out there; for all I know it
>> could happen on non-Macs too.
>>
>> Thanks,
>> Dean
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems