Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpirun only works when -np <4
From: Gus Correa (gus_at_[hidden])
Date: 2009-12-08 14:54:49


Hi Matthew

More guesses/questions than anything else:

1) Is there any additional load on this machine?
We had problems like that (on different machines) when
users start listening to streaming video, doing Matlab calculations,
etc, while the MPI programs are running.
This tends to oversubscribe the cores, and may lead to crashes.

2) RAM:
Can you monitor the RAM usage through "top"?
(I presume you are on Linux.)
It may show unexpected memory leaks, if they exist.

On "top", type "1" (one) see all cores, type "f" then "j"
to see the core number associated to each process.

3) Do the programs work right with other MPI flavors (e.g. MPICH2)?
If not, then it is not OpenMPI's fault.

4) Any possibility that the MPI versions/flavors of mpicc and
mpirun that you are using to compile and launch the program are not the
same?

5) Are you setting processor affinity on mpiexec?

mpiexec -mca mpi_paffinity_alone 1 -np ... bla, bla ...

Context switching across the cores may also cause trouble, I suppose.

6) Which Linux are you using (uname -a)?

On other mailing lists I read reports that only quite recent kernels
support all the Intel Nehalem processor features well.
I don't have Nehalem, I can't help here,
but the information may be useful
for other list subscribers to help you.

***

As for the programs, some programs require specific setup,
(and even specific compilation) when the number of MPI processes
vary.
It may help if you tell us a link to the program sites.

Baysian statistics is not totally out of our business,
but phylogenetic genetic trees is not really my league,
hence forgive me any bad guesses, please,
but would it need specific compilation or a different
set of input parameters to run correctly on a different
number of processors?
Do the programs mix MPI (message passing) with OpenMP (threads)?

I found this MrBayes, which seems to do the above:

http://mrbayes.csit.fsu.edu/
http://mrbayes.csit.fsu.edu/wiki/index.php/Main_Page

As for the ABySS, what is it, where can it be found?
Doesn't look like a deep ocean circulation model, as the name suggest.

My $0.02
Gus Correa
---------------------------------------------------------------------
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------

Matthew MacManes wrote:
> Hi All,
>
> I am having a problem running a couple of programs, ABySS and MrBayes in parallel. I am using Linux Ubuntu 9.10 with a dual socket (Xeon 5520) machine. There are 8 physical cores, or 16 with hyperthreading enabled.
>
> I use openMPI version 1.3.4, plus a few other packages downloaded via "apt-get install <program name>"
>
> 1st of all, let me say that when I specify that -np is less than 4 processors (1, 2, or 3), both programs seem to work as expected. Also, the non-mpi version of each of them works fine. Thus, I am pretty sure that this is a problem with MPI rather that with the program code or something else.
>
> What happens is simply that the program hangs.. There are no error messages, and there is no clue from anything else (system working fine otherwise- no RAM issues, etc). It does not hang at the same place everytime, sometimes in the very beginning, sometime near the middle..
>
> Could this an issue with hyperthreading? A conflict with something? I can give you all more info if that would be helpful in troubleshooting. I'm not sure if there are any diagnostics for mpirun, so that would be helpful to know about if there were.
>
> Thanks. Matt
> _________________________________
> Matthew MacManes
> PhD Candidate
> University of California- Berkeley
> Museum of Vertebrate Zoology
> Phone: 510-495-5833
> Lab Website: http://ib.berkeley.edu/labs/lacey
> Personal Website: http://macmanes.com/
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users