Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Collective component priorities and sm
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2010-06-09 08:31:40


On Jun 9, 2010, at 12:43 AM, Gus Correa wrote:

> btl_self_priority=0 (default value)
> btl_sm_priority=0 (default value)

These are ok. BTL selection is a combination of priority and reachability. The self BTL can *only* reach its own process. So process A will use the "self" BTL to talk to process A. The sm BTL can only reach *other* processes on the same host. So process A will use the sm BTL to talk to process B, providing A != B and both A and B are on the same host.

> coll_basic_priority=10 (default value)
> coll_hierarch_priority=0 (default value)
> coll_inter_priority=40 (default value)
> coll_self_priority=75 (default value)
> coll_sm_priority=0 (default value)
> coll_sync_priority=50 (default value)
> coll_tuned_priority=30 (default value)
>
> [Note that 'coll' priorities are *not* tied,
> 'self' is maximum (75), and 'sm' is minimum (0).]

Right. Coll selection, in essence, is the same as BTL selection, but the mechanics are a little different. Coll modules are selected on a per-communicator basis, and will only allow themselves to be selected if they can reach all members of a given communicator. For example, the self coll will only allow itself to be selected for MPI_COMM_SELF (and duplicates thereof). sm will only allow itself to be selected when all procs in the communicator are on the same host. And so on.

> coll:sm:comm_query (0/MPI_COMM_WORLD): priority too low; disqualifying
> myself
> coll:sm:comm_query (3/MPI_COMMUNICATOR 3): priority too low;
> disqualifying myself
>
> [Therefore, 'sm' seems to give up working in collectives ... :( ]

Correct.

I believe that we simply have the priority low for the sm collectives; you might want to try to raise it.

We are actually working on the shared memory collectives for future releases; the current sm coll module that is shipped only has 4 algorithms implemented: barrier, bcast, reduce, allreduce for intracommunicators. :-(

> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component not available: self
> coll:base:comm_select: component not available: sm
> coll:base:comm_select: component available: sync, priority: 50
> coll:base:comm_select: component available: tuned, priority: 30
> coll:base:comm_select: Checking all available modules
> coll:base:comm_select: component available: basic, priority: 10
> coll:base:comm_select: component not available: hierarch
> coll:base:comm_select: component not available: inter
> coll:base:comm_select: component available: self, priority: 75
>
> [Eventually 'sm', 'inter', and 'hierarch' seem to go out of business,
> whereas 'basic', 'sync' and 'tuned' hang in there.
> It is awkward that 'self' claims both to
> be available and not available!]

This must be selection for 2 different communicators. Right before the "checking all available modules" message, there should be another one identifying which communicator this selection is for. The tags on the left of the message should identify which process the selection is occurring in, so at least for MPI_THREAD_SINGLE MPI applications, the ordering should be deterministic and follow-able (even though the output from multiple processes may be interleaved -- the tags on the left should allow you to distinguish who is who).

> 1) Are the "coll" priorities above (default values) the best choices
> when I run in a single node, or were they chosen for a general
> situation when the job runs across node boundaries?

They're generally good. We're probably too conservative for the sm coll because there was a time when it was buggy. They should all be fixed now, though.

> 2) Why does "self" have the largest value (75)?

It will for MPI_COMM_SELF (and dups).

The coll priorities might be a bit confusing because they can adjust themselves during selection. It's also a bit more complicated because coll's are chosen on a per-communicator basis, and the priority is not necessarily uniform for every communicator.

Hence, you should probably look at those priorities as the *max* priority a given coll will present itself as. Hence, self's max will be 75. But for communicators where it doesn't allow itself to be selected, it's effectively 0.

> 3) Does it mean that all collectives will use the
> self/loopback mechanism for communication?

No.

> How about 'basic' and the rest of the gang with smaller priorities?

The priorities are assessed on a per-communicator basis, and the modules can adjust their priorities accordingly (to either 0 or their respective max value).

It's even *more* complicated because colls are allowed to mix and match on a single communicator. For example, I cited above that the sm coll only has bcast, barrier, reduce, and allreduce. So sm coll will "win" for communicator X, but only for those 4. The next highest coll will be used to fill in the others. If there's still more left after that one, then the next coll will be used, etc. The process is repeated until all MPI collective operations have a plugin to use.

> 4) Is it a good idea to set the 'sm' priority to a value
> larger than 75 (to beat "self" and take over the collective functions)?

It'll always beat self because self won't allow itself to be selected for communicators containing more than 1 process.

> 5) In this case, will the collectives only use "sm"?

If you set the sm priority large than basic and tuned, yes.

> 6) Will this improve or degrade performance ?

Depends on your app. :-) The idea is that it will improve performance if you're using those 4 operations. The others will generally fall back to tuned.

> 7) Is there any literature where I can learn
> more about these OpenMPI collective priorities?

Unfortunately not... :-(

> (I couldn't find anything about it on the FAQs.
> Actually, a group of FAQ about collectives would be very helpful.)

Agreed. You wouldn't have a few cycles to write this stuff up, would you?

    https://svn.open-mpi.org/trac/ompi/wiki/OMPIFAQEntries

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/