Hi Gus,

Interestingly the results for the connectivity_c test... works fine with -np <8. For -np >8 it works some of the time, other times it HANGS. I have got to believe that this is a big clue!! Also, when it hangs, sometimes I get the message "mpirun was unable to cleanly terminate the daemons on the nodes shown below" Note that NO nodes are shown below.   Once, I got -np 250 to pass the connectivity test, but I was not able to replicate this reliable, so I'm not sure if it was a fluke, or what.  Here is a like to a screenshop of TOP when connectivity_c is hung with -np 14.. I see that 2 processes are only at 50% CPU usage.. Hmmmm  

http://picasaweb.google.com/lh/photo/87zVEucBNFaQ0TieNVZtdw?authkey=Gv1sRgCLKokNOVqo7BYw&feat=directlink

The other tests, ring_c, hello_c, as well as the cxx versions of these guys with with all values of -np.

Using -mca mpi-paffinity_alone 1 I get the same behavior. 

I agree that I am should worry about the mismatch between where the libraries are installed versus where I am telling my programs to look for them. Would this type of mismatch cause behavior like what I am seeing, i.e. working with  a small number of processors, but failing with larger?  It seems like a mismatch would have the same effect regardless of the number of processors used. Maybe I am mistaken. Anyway, to address this, which mpirun gives me /usr/local/bin/mpirun.. so to configure ./configure --with-mpi=/usr/local/bin/mpirun and to run /usr/local/bin/mpirun -np X ...  This should 

uname -a gives me: Linux macmanes 2.6.31-16-generic #52-Ubuntu SMP Thu Dec 3 22:07:16 UTC 2006 x86_64 GNU/Linux

Matt

On Dec 8, 2009, at 8:50 PM, Gus Correa wrote:

Hi Matthew

Please see comments/answers inline below.

Matthew MacManes wrote:
Hi Gus, Thanks for your ideas.. I have a few questions, and will try to answer yours in hopes of solving this!!

A simple way to test OpenMPI on your system is to run the
test programs that come with the OpenMPI source code,
hello_c.c, connectivity_c.c, and ring_c.c:
http://www.open-mpi.org/

Get the tarball from the OpenMPI site, gzip and untar it,
and look for it in the "examples" directory.
Compile it with /your/path/to/openmpi/bin/mpicc hello_c.c
Run it with /your/path/to/openmpi/bin/mpiexec -np X a.out
using X = 2, 4, 8, 16, 32, 64, ...

This will tell if your OpenMPI is functional,
and if you can run on many Nehalem cores,
even with oversubscription perhaps.
It will also set the stage for further investigation of your
actual programs.


Should I worry about setting things like --num-cores --bind-to-cores?  This, I think, gets at your questions about processor affinity.. Am I right? I could not exactly figure out the -mca mpi-paffinity_alone stuff...

I use the simple minded -mca mpi-paffinity_alone 1.
This is probably the easiest way to assign a process to a core.
There more complex  ways in OpenMPI, but I haven't tried.
Indeed, -mca mpi-paffinity_alone 1 does improve performance of
our programs here.
There is a chance that without it the 16 virtual cores of
your Nehalem get confused with more than 3 processes
(you reported that -np > 3 breaks).

Did you try adding just -mca mpi-paffinity_alone 1  to
your mpiexec command line?


1. Additional load: nope. nothing else, most of the time not even firefox.

Good.
Turn off firefox, etc, to make it even better.
Ideally, use runlevel 3, no X, like a computer cluster node,
but this may not be required.

2. RAM: no problems apparent when monitoring through TOP. Interesting, I did wonder about oversubscription, so I tried the option --nooversubscription, but this gave me an error mssage.

Oversubscription from your program would only happen if
you asked for more processes than available cores, i.e.,
-np > 8 (or "virtual" cores, in case of Nehalem hyperthreading,
-np > 16).
Since you have -np=4 there is no oversubscription,
unless you have other external load (e.g. Matlab, etc),
but you said you don't.

Yet another possibility would be if your program is threaded
(e.g. using OpenMP along with MPI), but considering what you
said about OpenMP I would guess the programs don't use it.
For instance, you launch the program with 4 MPI processes,
and each process decides to start, say, 8 OpenMP threads.
You end up with 32 threads and 8 (real) cores (or 16 hyperthreaded
ones on Nehalem).


What else does top say?
Any hog processes (memory- or CPU-wise)
besides your program processes?

3. I have not tried other MPI flavors.. Ive been speaking to the authors of the programs, and they are both using openMPI.  

I was not trying to convince you to use another MPI.
I use MPICH2 also, but OpenMPI reigns here.
The idea or trying it with MPICH2 was just to check whether OpenMPI
is causing the problem, but I don't think it is.

4. I don't think that this is a problem, as I'm specifying --with-mpi=/usr/bin/...  when I compile the programs. Is there any other way to be sure that this is not a problem?

Hmmm ....
I don't know about your Ubuntu (we have CentOS and Fedora on various
machines).
However, most Linux distributions come with their MPI flavors,
and so do compilers, etc.
Often times they install these goodies in unexpected places,
and this has caused a lot of frustration.
There are tons of postings on this list that eventually
boiled down to mismatched versions of MPI in unexpected places.


The easy way is to use full path names to compile and to run.
Something like this:
/my/openmpi/bin/mpicc on your program configuration script),

and something like this
/my/openmpi/bin/mpiexec -np  ... bla, bla ...
when you submit the job.

You can check your version with "which mpicc", "which mpiexec",
and (perhaps using full path names) with
"ompi_info", "mpicc --showme", "mpiexec --help".


5. I had not been, and you could see some shuffling when monitoring the load on specific processors. I have tried to use --bind-to-cores to deal with this. I don't understand how to use the -mca options you asked about. 6. I am using Ubuntu 9.10. gcc 4.4.1 and g++  4.4.1

I am afraid I won't be of help, because I don't have Nehalem.
However, I read about Nehalem requiring quite recent kernels
to get all of its features working right.

What is the output of "uname -a"?
This will tell the kernel version, etc.
Other list subscribers may give you a suggestion if you post the
information.

MyBayes is a for bayesian phylogenetics:  http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page ABySS: is a program for assembly of DNA sequence data: http://www.bcgsc.ca/platform/bioinfo/software/abyss

Thanks for the links!
I had found the MrBayes link.
I eventually found what your ABySS was about, but no links.
Amazing that it is about DNA/gene sequencing.
Our abyss here is the deep ocean ... :)
Abysmal difference!

Do the programs mix MPI (message passing) with OpenMP (threads)?
Im honestly not sure what this means..

Some programs mix the two.
OpenMP only works in a shared memory environment (e.g. a single
computer like yours), whereas MPI can use both shared memory
and work across a network (e.g. in a cluster).
There are other differences too.

Unlikely that you have this hybrid type of parallel program,
otherwise there would be some reference to OpenMP
on the very program configuration files, program documentation, etc.
Also, in general the configuration scripts of these hybrid
programs can turn on MPI only, or OpenMP only, or both,
depending on how you configure.

Even to compile with OpenMP you would need a proper compiler
flag, but that one might be hidden in a Makefile too, making
a bit hard to find. "grep -n mp Makefile" may give a clue.
Anything on the documentation that mentions threads or OpenMP?

FYI, here is OpenMP:
http://openmp.org/wp/

Thanks for all your help!
> Matt

Well, so far it didn't really help. :(

But let's hope to find a clue,
maybe with a little help of
our list subscriber friends.

Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Hi Matthew

More guesses/questions than anything else:

1) Is there any additional load on this machine?
We had problems like that (on different machines) when
users start listening to streaming video, doing Matlab calculations,
etc, while the MPI programs are running.
This tends to oversubscribe the cores, and may lead to crashes.

2) RAM:
Can you monitor the RAM usage through "top"?
(I presume you are on Linux.)
It may show unexpected memory leaks, if they exist.

On "top", type "1" (one) see all cores, type "f" then "j"
to see the core number associated to each process.

3) Do the programs work right with other MPI flavors (e.g. MPICH2)?
If not, then it is not OpenMPI's fault.

4) Any possibility that the MPI versions/flavors of mpicc and
mpirun that you are using to compile and launch the program are not the
same?

5) Are you setting processor affinity on mpiexec?

mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ...

Context switching across the cores may also cause trouble, I suppose.

6) Which Linux are you using (uname -a)?

On other mailing lists I read reports that only quite recent kernels
support all the Intel Nehalem processor features well.
I don't have Nehalem, I can't help here,
but the information may be useful
for other list subscribers to help you.

***

As for the programs, some programs require specific setup,
(and even specific compilation) when the number of MPI processes
vary.
It may help if you tell us a link to the program sites.

Baysian statistics is not totally out of our business,
but phylogenetic genetic trees is not really my league,
hence forgive me any bad guesses, please,
but would it need specific compilation or a different
set of input parameters to run correctly on a different
number of processors?
Do the programs mix MPI (message passing) with OpenMP (threads)?

I found this MrBayes, which seems to do the above:

http://mrbayes.csit.fsu.edu/
http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page

As for the ABySS, what is it, where can it be found?
Doesn't look like a deep ocean circulation model, as the name suggest.

My $0.02
Gus Correa
------------------------------------------------------------------------
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/