Hi Gus, 

Thanks for your ideas.. I have a few questions, and will try to answer yours in hopes of solving this!!

Should I worry about setting things like --num-cores --bind-to-cores?  This, I think, gets at your questions about processor affinity.. Am I right? I could not exactly figure out the -mca mpi-paffinity_alone stuff...

1. Additional load: nope. nothing else, most of the time not even firefox. 
2. RAM: no problems apparent when monitoring through TOP. Interesting, I did wonder about oversubscription, so I tried the option --nooversubscription, but this gave me an error mssage.
3. I have not tried other MPI flavors.. Ive been speaking to the authors of the programs, and they are both using openMPI.  
4. I don't think that this is a problem, as I'm specifying --with-mpi=/usr/bin/...  when I compile the programs. Is there any other way to be sure that this is not a problem?
5. I had not been, and you could see some shuffling when monitoring the load on specific processors. I have tried to use --bind-to-cores to deal with this. I don't understand how to use the -mca options you asked about. 
6. I am using Ubuntu 9.10. gcc 4.4.1 and g++  4.4.1


MyBayes is a for bayesian phylogenetics:  http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page 
ABySS: is a program for assembly of DNA sequence data: http://www.bcgsc.ca/platform/bioinfo/software/abyss

Do the programs mix MPI (message passing) with OpenMP (threads)? 

Im honestly not sure what this means..

Thanks for all your help!

Matt

 Hi Matthew

More guesses/questions than anything else:

1) Is there any additional load on this machine?
We had problems like that (on different machines) when
users start listening to streaming video, doing Matlab calculations,
etc, while the MPI programs are running.
This tends to oversubscribe the cores, and may lead to crashes.

2) RAM:
Can you monitor the RAM usage through "top"?
(I presume you are on Linux.)
It may show unexpected memory leaks, if they exist.

On "top", type "1" (one) see all cores, type "f" then "j"
to see the core number associated to each process.

3) Do the programs work right with other MPI flavors (e.g. MPICH2)?
If not, then it is not OpenMPI's fault.

4) Any possibility that the MPI versions/flavors of mpicc and
mpirun that you are using to compile and launch the program are not the
same?

5) Are you setting processor affinity on mpiexec?

mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ...

Context switching across the cores may also cause trouble, I suppose.

6) Which Linux are you using (uname -a)?

On other mailing lists I read reports that only quite recent kernels
support all the Intel Nehalem processor features well.
I don't have Nehalem, I can't help here,
but the information may be useful
for other list subscribers to help you.

***

As for the programs, some programs require specific setup,
(and even specific compilation) when the number of MPI processes
vary.
It may help if you tell us a link to the program sites.

Baysian statistics is not totally out of our business,
but phylogenetic genetic trees is not really my league,
hence forgive me any bad guesses, please,
but would it need specific compilation or a different
set of input parameters to run correctly on a different
number of processors?
Do the programs mix MPI (message passing) with OpenMP (threads)?

I found this MrBayes, which seems to do the above:

http://mrbayes.csit.fsu.edu/
http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page

As for the ABySS, what is it, where can it be found?
Doesn't look like a deep ocean circulation model, as the name suggest.

My $0.02
Gus Correa