Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Progress of the asynchronous messages
From: Nifty niftyompi Mitch (niftyompi_at_[hidden])
Date: 2008-11-10 16:50:18

On Thu, Nov 06, 2008 at 03:04:13PM -0500, Jeff Squyres wrote:
> For the web archives: this same question was posted and answered on the
> users list. See this thread:

Good thread... one possible omission is the possible replacement of
the sleep(1) with sched_yield() to get some overlap with other system

As a general rule tight test loops should be aware of the max
and minimum times for the test to change to true. Retesting
the flag sooner than the minimum time invites system contention.
Waiting longer than the max time wastes resources.

The loop should know if the state of the object being tested will change
without local CPU activity. If the CPU you are executing the test loop on
is the same CPU/core that will finish the transaction then a sched_yield()
is a very good thing.

Also knowing if the test itself impacts the system is important (example:
cache line contention or system call).

MPI is interesting because for some hardware a lot of work is done in
user space and a "sleep()" or "sched_yield()" gets no MPI work done.
Other transport code moves data with system calls (example: tcp/ip)
where yielding gives the system an opportunity to work any IO queue,
or interrupt that might be pending.

To point...
>> vladimir marjanovic wrote:
>>> In order to overlap communication and computation

Communication requires work in the form of {small, medium, large}
interaction with a processor. Work is work and overlap is strictly
not possible. Thus the problem is scheduling for minimum conflict
which is just hard to solve in the general set of cases since scheduling
is work too. Thus "sched_yield()" may help.

> On Nov 6, 2008, at 1:00 PM, vladimir marjanovic wrote:

>>> I am new user of Open MPI, I've used MPICH before.
>>> I've tried on the user list but they couldn't help me.
>>> There is performance bug with the following scenario:
>>> proc_B: MPI_Isend(...,proc_A,..,&request)
>>> do{
>>> sleep(1);
>>> MPI_Test(..,&flag,&request);
>>> count++
>>> }while(!flag);
>>> proc_A: MPI_Recv(...,proc_B);
>>> For message size 8MB, proc_B calls MPI_Test 88 times. It means that
>>> point to point communication costs 88 seconds.
>>> Btw, bandwidth isn't the problem (interconnection network:
>>> InfiniBand)
>>> Obviously, there is the problem with progress of the asynchronous
>>> messages. In order to overlap communication and computation I don't
>>> want to use MPI_Wait. Probably, the message is being decomposed into
>>> chucks and the size of chuck is probably defined by environment
>>> variable.
>>> How can I advance the message more aggressively or can I control
>>> size of chunk?
>>> Thank you very much
>>> Vladimir
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
> --
> Jeff Squyres
> Cisco Systems
> _______________________________________________
> devel mailing list
> devel_at_[hidden]

	T o m  M i t c h e l l 
	Found me a new hat, now what?