On May 15, 2006, at 9:14 AM, Gurhan Ozen wrote:
> Jeff, George, Brian thanks for your inputs in this.
> I did "kind of" get openib working. Different revisions of kernel was
> running on both boxes, getting them running on the very same revisions
> of kernel and recompiling open-mpi with that rev. of kernel got me
> hello_world program running over openib stack.
> However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
> not working. For each one of them, i get the same error:
> [hostname:11992] *** An error occurred in MPI_Isend
> [hostname:11992] *** on communicator MPI_COMM_WORLD
> [hostname:11992] *** MPI_ERR_INTERN: internal error
> [hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [hostname:11998] *** An error occurred in MPI_Barrier
> [hostname:11998] *** on communicator MPI_COMM_WORLD
> [hostname:11998] *** MPI_ERR_INTERN: internal error
> [hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby
> [hostname:01916] *** An error occurred in MPI_Send
> [hostname:01916] *** on communicator MPI_COMM_WORLD
> [hostname:01916] *** MPI_ERR_INTERN: internal error
> [hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)
> This is not just happening over network, but also locally. I am
> inclined to think that i miss some compilation flags or whatever.. I
> have tried this with openmpi-1.1a4 version as well , but kept on
> getting the same errors.
> Questions of the day:
> 1- Does anyone know why I might be getting this errors?
This generally means that there was no btl available to move data
between nodes. So I think you still have some issues with your
network setup (unfortunately, I'm not able to help here. George asked
for some debugging information that would be most helpful to us --
you might want to try getting that data with your current setup).
> 2- I couldn't find any "free" debuggers for debugging open-mpi
> programs, does anyone know of any? Are there any tricks to use gdb ,
> at least to debug locally running mpi programs?
The simple, dirty trick is to setup X11 forwarding with ssh and run:
mpirun -np X -d xterm -e gdb <myapp>
You'll get a bunch of xterms open and can debug that way. It's
simple, it's cheap, but it definitely doesn't scale.
Open MPI developer