On Jan 13, 2006, at 10:41 PM, Glenn Morris wrote:
> The combination OpenMP + OpenMPI works fine if I restrict the
> application to only 1 OpenMP thread per MPI process (in other words
> the code at least compiles and runs fine with both options on, in this
> limited sense). If I try to use my desired value of 4 OpenMP threads,
> it crashes. It works fine, however, if I use MPICH for the MPI
> The hostfile specifies "slots=4 max-slots=4" for each host (trying to
> lie and say "slots=1" die not help), and I use "-np 4 --bynode" to get
> only one MPI process per host. I'm using ssh over Gbit ethernet
> between hosts.
> There is no useful error message that I can see. Watching top, I can
> see that processes are spawned on the four hosts, split into 4 OpenMP
> threads, and then crash immediately. The only error message is:
> mpirun noticed that job rank 0 with PID 30243 on node "coma006"
> exited on signal 11.
> Broken pipe
It looks like your application is dying from a segmentation fault.
The question is -- did Open MPI cause the segfault or is there
something in your application that Open MPI didn't like. It would be
useful to get a stack trace from the process that is causing the
segfault. Since you're only running 4 processes and using ssh to
start them, the easiest way is to start your process in gdb in an
xterm. You have to have ssh X forwarding enabled for this trick to
work, but then running something like:
mpirun -np 4 --bynode -d xterm -e gdb <myapp>
should pop up 4 xterm windows, one for each process. Type "run" in
each gdb process in the xterms and it should be off and running.
If this would be a major pain, the other option is to try the nightly
build of Open MPI from the trunk, as it will try to print a stack
trace when errors like the one above occur. But I would start with
trying the gdb method. Of course, if you have TotalView or another
parallel debugger, that would be even easier.
Open MPI developer