Jeff, George, Brian thanks for your inputs in this.
I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.
However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:
[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)
[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby
[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)
This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with openmpi-1.1a4 version as well , but kept on
getting the same errors.
Questions of the day:
1- Does anyone know why I might be getting this errors?
2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?
On 5/12/06, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> > -----Original Message-----
> > From: users-bounces_at_[hidden]
> > [mailto:users-bounces_at_[hidden]] On Behalf Of Gurhan Ozen
> > Sent: Thursday, May 11, 2006 4:11 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] Open MPI and OpenIB
> > At any rate though, --mca btl ib,self looks like the traffic goes over
> > ethernet device .. I couldn't find any documentation on the "self"
> > argument of mca, does it mean to explore alternatives if the desired
> > btl (in this case ib) doesn't work?
> Note that Open MPI still does use TCP for "setup" information; a bunch
> of data is passed around via mpirun and MPI_INIT for all the processes
> to find each other, etc. Similar control messages get passed around
> during MPI_FINALIZE as well.
> This is likely the TCP traffice that you are seeing. However, rest
> assured that the btl MCA parameter will unequivocally set the network
> that MPI traffic will use.
> I've updated the on-line FAQ with regards to the "self" BTL module.
> And finally, a man page is available for mpirun in the [not yet
> released] Open MPI 1.1 (see
> It should be pretty much the same for 1.0. One notable difference is I
> just recently added a -nolocal option (not yet on the trunk, but likely
> will be in the not-distant future) that does not exist in 1.0.
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
> users mailing list