Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Gurhan Ozen (gurhan.ozen_at_[hidden])
Date: 2006-05-17 11:36:48


  Either gmail or ompi users list is borked, i am resending this
since it hasn't showed up in the list yet after 2 days..

Thanks,
gurhan

---------- Forwarded message ----------
From: Gurhan Ozen <gurhan.ozen_at_[hidden]>
Date: May 15, 2006 9:14 AM
Subject: Re: [OMPI users] Open MPI and OpenIB
To: Open MPI Users <users_at_[hidden]>

Jeff, George, Brian thanks for your inputs in this.

I did "kind of" get openib working. Different revisions of kernel was
running on both boxes, getting them running on the very same revisions
of kernel and recompiling open-mpi with that rev. of kernel got me
hello_world program running over openib stack.

However, most MPI_* functions , such as MPI_Isend(), MPI_Barrier() are
not working. For each one of them, i get the same error:

[hostname:11992] *** An error occurred in MPI_Isend
[hostname:11992] *** on communicator MPI_COMM_WORLD
[hostname:11992] *** MPI_ERR_INTERN: internal error
[hostname:11992] *** MPI_ERRORS_ARE_FATAL (goodbye)

[hostname:11998] *** An error occurred in MPI_Barrier
[hostname:11998] *** on communicator MPI_COMM_WORLD
[hostname:11998] *** MPI_ERR_INTERN: internal error
[hostname:11998] *** MPI_ERRORS_ARE_FATAL (goodby

[hostname:01916] *** An error occurred in MPI_Send
[hostname:01916] *** on communicator MPI_COMM_WORLD
[hostname:01916] *** MPI_ERR_INTERN: internal error
[hostname:01916] *** MPI_ERRORS_ARE_FATAL (goodbye)

This is not just happening over network, but also locally. I am
inclined to think that i miss some compilation flags or whatever.. I
have tried this with openmpi-1.1a4 version as well , but kept on
getting the same errors.

Questions of the day:
1- Does anyone know why I might be getting this errors?
2- I couldn't find any "free" debuggers for debugging open-mpi
programs, does anyone know of any? Are there any tricks to use gdb ,
at least to debug locally running mpi programs?

Thanks again,
Gurhan

On 5/12/06, Jeff Squyres (jsquyres) <jsquyres_at_[hidden]> wrote:
> > -----Original Message-----
> > From: users-bounces_at_[hidden]
> > [mailto:users-bounces_at_[hidden]] On Behalf Of Gurhan Ozen
> > Sent: Thursday, May 11, 2006 4:11 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] Open MPI and OpenIB
> >
> > At any rate though, --mca btl ib,self looks like the traffic goes over
> > ethernet device .. I couldn't find any documentation on the "self"
> > argument of mca, does it mean to explore alternatives if the desired
> > btl (in this case ib) doesn't work?
>
> Note that Open MPI still does use TCP for "setup" information; a bunch
> of data is passed around via mpirun and MPI_INIT for all the processes
> to find each other, etc. Similar control messages get passed around
> during MPI_FINALIZE as well.
>
> This is likely the TCP traffice that you are seeing. However, rest
> assured that the btl MCA parameter will unequivocally set the network
> that MPI traffic will use.
>
> I've updated the on-line FAQ with regards to the "self" BTL module.
>
> And finally, a man page is available for mpirun in the [not yet
> released] Open MPI 1.1 (see
> http://svn.open-mpi.org/svn/ompi/trunk/orte/tools/orterun/orterun.1).
> It should be pretty much the same for 1.0. One notable difference is I
> just recently added a -nolocal option (not yet on the trunk, but likely
> will be in the not-distant future) that does not exist in 1.0.
>
> --
> Jeff Squyres
> Server Virtualization Business Unit
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>