On Tue, 2009-12-08 at 08:30 -0800, Matthew MacManes wrote:
> There are 8 physical cores, or 16 with hyperthreading enabled.
That should be meaty enough.
> 1st of all, let me say that when I specify that -np is less than 4
> processors (1, 2, or 3), both programs seem to work as expected. Also,
> the non-mpi version of each of them works fine.
Presumably the non-mpi version is serial however? this this doesn't mean
the program is bug-free or that the parallel version isn't broken.
There are any number of apps that don't work above N processes, in fact
probably all programs break for some value of N, it's normally a little
higher then 3 however.
> Thus, I am pretty sure that this is a problem with MPI rather that
> with the program code or something else.
> What happens is simply that the program hangs..
I presume you mean here the output stops? The program continues to use
CPU cycles but no longer appears to make any progress?
I'm of the opinion that this is most likely a error in your program, I
would start by using either valgrind or padb.
You can run the app under valgrind using the following mpirun options,
this will give you four files named v.log.0 to v.log.3 which you can
check for errors in the normal way. The "--mca btl tcp,self" option
will disable shared memory which can create false positives.
mpirun -n 4 --mca btl tcp,self valgrind --log-file=v.log.%
Alternatively you can run the application, wait for it to hang and then
in another window run my tool, padb, which will show you the MPI message
queues and stack traces which should show you where it's hung,
instructions and sample output are on this page.
> There are no error messages, and there is no clue from anything else
> (system working fine otherwise- no RAM issues, etc). It does not hang
> at the same place everytime, sometimes in the very beginning, sometime
> near the middle..
> Could this an issue with hyperthreading? A conflict with something?
Unlikely, if there was a problem in OMPI running more than 3 processes
it would have been found by now. I regularly run 8 process applications
on my dual-core netbook alongside all my desktop processes without
issue, it runs fine, a little slowly but fine.
All this talk about binding and affinity won't help either, process
binding is about squeezing the last 15% of performance out of a system
and making performance reproducible, it has no bearing on correctness or
scalability. If you're not running on a dedicated machine which with
firefox running I guess you aren't then there would be a good case for
leaving it off anyway.
Ashley Pittman, Bath, UK.
Padb - A parallel job inspection tool for cluster computing