Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Gurhan Ozen (gurhan.ozen_at_[hidden])
Date: 2006-05-17 11:36:48


  Either gmail or ompi users list is borked, i am resending this
since it hasn't showed up in the list yet after 2 days..

Thanks,
gurhan

---------- Forwarded message ----------
From: Gurhan Ozen <gurhan.ozen_at_[hidden]>
Date: May 15, 2006 9:14 AM
Subject: Re: [OMPI users] Open MPI and OpenIB
To: Open MPI Users <users_at_[hidden]>

Jeff, George, Brian thanks for your inputs in this.

I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.

However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:

[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)

[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby

[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)

This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with openmpi-1.1a4 version as well , but kept on
getting the same errors.

Questions of the day:
1- Does anyone know why I might be getting this errors?
2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?

Thanks again,
Gurhan

On 5/12/06, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> > -----Original Message-----
> > From: users-bounces_at_[hidden]
> > [mailto:users-bounces_at_[hidden]] On Behalf Of Gurhan Ozen
> > Sent: Thursday, May 11, 2006 4:11 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] Open MPI and OpenIB
> >
> > At any rate though, --mca btl ib,self looks like the traffic goes over
> > ethernet device .. I couldn't find any documentation on the "self"
> > argument of mca, does it mean to explore alternatives if the desired
> > btl (in this case ib) doesn't work?
>
> Note that Open MPI still does use TCP for "setup" information; a bunch
> of data is passed around via mpirun and MPI_INIT for all the processes
> to find each other, etc. Similar control messages get passed around
> during MPI_FINALIZE as well.
>
> This is likely the TCP traffice that you are seeing. However, rest
> assured that the btl MCA parameter will unequivocally set the network
> that MPI traffic will use.
>
> I've updated the on-line FAQ with regards to the "self" BTL module.
>
> And finally, a man page is available for mpirun in the [not yet
> released] Open MPI 1.1 (see
> http://svn.open-mpi.org/svn/ompi/trunk/orte/tools/orterun/orterun.1).
> It should be pretty much the same for 1.0. One notable difference is I
> just recently added a -nolocal option (not yet on the trunk, but likely
> will be in the not-distant future) that does not exist in 1.0.
>
> --
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>