Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OpenMPI Performance Problem with Open|SpeedShop
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-01-14 09:01:33


If your timer is actually generating an interrupt to the process, then
that could be the source of the problem. I believe the event library
also treats interrupts as events, and assigns them the highest
priority. So every one of your interrupts would cause the event
library to stop what it was doing and go into its interrupt handling
routine.

I'm no expert on the event library though - just speculating that this
could be the source of the problem.

Ralph

On Jan 13, 2009, at 8:18 PM, William Hachfeld wrote:

>
> Jeff & George,
>
>
> > Hum; interesting. I can't think of any reason why that would be a
> problem offhand. The
> > mca_btl_sm_component_progress() function is the shared memory
> progression function.
> > opal_progress() and mca_bml_r2_progress() are likely mainly
> dispatching off to this
> > function.
> >
> > Does OSS interfere with shared memory between processes in any
> way? (I'm not enough
> > of a kernel guy to know what the ramifications of ptrace and
> whatnot are)
>
> Open|SS shouldn't interfere with shared memory. We use the pthread
> library to access some TLS, but no shared memory...
>
>
> > There might be one reason to slowdown the application quite a bit.
> If the fact that you're
> > using timer interact with the libevent (the library we're using to
> internally manage any kind
> > of events), then we might end-up in the situation where we call
> the poll for every iteration
> > in the event library. And this is really expensive.
>
> I did contemplate the notion that maybe we were getting into the
> "progress monitoring" part of OpenMPI every time the timer
> interrupted the process (1000s of times per second). Can either of
> you see any mechanism by which that might happen?
>
>
> > A quick way to figure out if this is that case is to run Open MPI
> without support for shared
> > memory (--mca btl ^sm). This way we will call poll on a regular
> basis anyway, and if there
> > is no difference between a normal run and a OSS one, we know at
> least where to start
> > looking ...
>
> I ran SMG2000 on an 8-CPU Yellowrail node in the two configurations
> and recorded the wall/cpu clock times as reported by SMG2000 itself:
>
> "mpirun -np 8 smg2000 -n 32 64 64"
>
> Struct Interface, wall clock time = 0.042348 seconds
> Struct Interface, cpu clock time = 0.040000 seconds
> SMG Setup, wall clock time =0.732441 seconds
> SMG Setup, cpu clock time = 0.730000 seconds
> SMG Solve, wall clock time = 6.881814 seconds
> SMG Solve, cpu clock time =6.880000 seconds
>
> "mpirun --mca btl ^sm -np 8 smg2000 -n 64 64 64"
>
> Struct Interface, wall clock time = 0.059137 seconds
> Struct Interface, cpu clock time = 0.060000 seconds
> SMG Setup, wall clock time = 0.931437 seconds
> SMG Setup, cpu clock time = 0.930000 seconds
> SMG Solve, wall clock time = 9.107343 seconds
> SMG Solve, cpu clock time = 9.110000 seconds
>
> But running the application with the "--mac btl ^sm" option inside
> Open|SS also results in an extreme slowdown. I.e. it doesn't make
> any difference whether the shared memory transport is enabled or
> not. Open|SS reports time spent as follows (in case this helps
> pinpoint what is going on inside OpenMPI):
>
> Exclusive CPU
> time in seconds. Function (defining location)
>
> 364.050000 btl_openib_component_proress (libmpi.so.0)
> 165.890000 mthca_poll_cq (libmthca-rdmav2.so)
> 122.090000 pthread_spin_lock (libpthread.so.0)
> 90.790000 opal_progress (libopen-pal.so.0)
> 48.230000 mca_bml_r2_progress (libmpi.so.0)
> 30.880000 ompi_request_wait_all (libmpi.so.0)
> 9.780000 pthread_spin_unlock (libpthread.so.0)
> 4.910000 mthca_free_srq_wqe (libmthca-rdmav2.so)
> 4.910000 mthca_unlock_cqs (libmthca-rdmav2.so)
> 4.730000 mthca_lock_cqs (libmthca-rdmav2.so)
> 0.890000 __poll (libc.so.6)
> ...
>
> Does this help at all?
>
>
> -- Bill Hachfeld, The Open|SpeedShop Project
>
>
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel