Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: Remove heterogeneous support
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-04-25 05:47:42


On Apr 24, 2014, at 10:47 PM, Ralph Castain <rhc_at_[hidden]> wrote:

>> And, as George pointed out, I see a trend towards heterogeneity in
>> HPC, to I'd say this feature will be rather more important in the
>> future.
>
> We have been hearing about such "trends" for a long time, but have yet to see them actually happen. Not saying it couldn't some day - just saying it still hasn't happened in production.

+1

MPI was designed to support heterogeneity all the way back from MPI-1.0 (1994) on these same kinds of arguments. It hasn't really panned out for more than a handful of users.

Keep in mind that data size heterogeneity is an unsolved problem. What do you do if one process sends a 4-byte integer of value 0xff00 0000 to a peer with only 2-byte integers?

>> So, would repairing the code be significantly more complicated than a
>> clean extraction?
>
> So here's what I suggest: if someone is willing to take the lead in fixing hetero operations, and has the hardware upon which to verify it, then please step forward. Otherwise, I agree with Jeff that we should remove it and move on.

The broken part(s) is(are) likely somewhere in the datatype and/or PML code (my guess). Keep in mind that my only testing of this feature is in *homogeneous* mode -- i.e., I compile with --enable-heterogeneous and then run tests on homogeneous machines. Meaning: it's not only broken for actual heterogeneity, it's also broken in the "unity"/homogeneous case.

So which is more complicated: fix or remove? I don't know; as George mentions, I suspect removal is likely to be a little tricky.

But ask that question a little differently: which is more complicated, long-term maintenance of a feature which no one really tests (or even has the hardware setup to test) or removal?

To me, the answer is a little more clear that way.

That being said, there are 3 disagreements with this RFC so far:

1. George: on principle
2. Andreas: (might) use heterogeneity if it worked
3. Siegmar: uses heterogeneity in older OMPI versions in his SPARC+Intel setups

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/