Hi jeff,

thanks for the reply. Pls see below .

And a new question:

How do you handle thread/task and memory affinity? Do you pass the requested affinity desires to the batch scheduler and them let it issue the specific placements for threads to the nodes ?

This is something we are concerned as we are running multiple jobs on same node and we don't want to oversubscribe cores by binding there threads inadvertandly.

Looking at ompi_info
 $ ompi_info | grep -i aff
           MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.4.2)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.4.2)

does this mean we have the full affinity support included or do I need to involve HWLOC in any way ?



On 05/25/10 08:35, Jeff Squyres wrote:
On May 24, 2010, at 2:02 PM, Michael E. Thomadakis wrote:

  
| > 1) high-resolution timers: how do I specify the HRT linux timers in the
| >     --with-timer=TYPE
| >  line of ./configure ?
|
| You shouldn't need to do anything; the "linux" timer component of Open MPI
| should get automatically selected.  You should be able to see this in the
| stdout of Open MPI's "configure", and/or if you run ompi_info | grep timer
| -- there should only be one entry: linux.

If nothing is menioned, will it by default select 'linux' timers?
    
Yes.

  
Or I have to specify in th configure

        --with-timer=linux ?
    
Nope.  The philosophy of Open MPI is that whenever possible, we try to choose a sensible default.  It never hurts to double check, but we try to do the Right Thing whenever it's possible to automatically choose it (within reason, of course).

You can also check the output of ompi_info -- ompi_info tells you lots of things about your Open MPI installation.

  
I actually spent some time looking around in the source trying to see which
actual timer is the base. Is this a high-resolution timer such as a POSIX
timers (timer_gettime or clock_nanosleep, etc.) or Intel processor's TSC ?

I am just trying to stay away from gettimeofday()
    
Understood.

Ugh; I just poked into the code -- it's complicated how we resolve the timer functions.  It looks like we put in the infrastructure into getting high resolution timers, but at least for Linux, we don't use it (the code falls back to gettimeofday).  It looks like we're only using the high-resolution timers on AIX (!) and Solaris.

Patches would be greatly appreciated; I'd be happy to walk someone through what to do. 

  

Which HRtimer is recommended for a Linux environment ? timer_gettime usually gives decent resolution and it is portable. I don't want to promise anything as I am already bogged down with several ongoing projects. You can give me brief  instructions to see if this can be squeezed in.
...

Justr as a feedback from one of the many HPC centers, for us it is most
important to have

a) a light-weight efficient MPI stack which makes the underlying IB h/w
capabilities available and

b) it can smoothly cooperate withe a batch scheduler / resource manager so
that a mixture of jobs get a decent allocation of the cluster resources.
    
Cools; good to know.  We try to make these things very workable in Open MPI -- it's been a goal from day 1 to integrate with job schedulers, etc.  And without high performance, we wouldn't have much to talk about.

Please be sure to let us know of questions / problems / etc.  I admit that we're sometimes a little slow to answer on the users list, but we do the best we can.  So don't hesitate to bump us if we don't reply.

Thanks!

  

Thanks again...
michael


-- 
% -------------------------------------------------------------------- \
% Michael E. Thomadakis, Ph.D.  Senior Lead Supercomputer Engineer/Res \
% E-mail: miket AT tamu DOT edu                   Texas A&M University \
% web:    http://alphamike.tamu.edu              Supercomputing Center \
% Voice:  979-862-3931                    Teague Research Center, 104B \
% FAX:    979-847-8643                  College Station, TX 77843, USA \
% -------------------------------------------------------------------- \