Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Sridhar Chirravuri (sridhar_at_[hidden])
Date: 2005-08-18 05:20:50


Hi,

Thanks for the info about IMB. I will download the latest one.

Pallas was running fine in intra-node case. But it is hanging in
inter-node case.

I have a small MPI program which send/recv a char. I have tested this
program across the nodes (inter-node) as follows. It ran fine across the
nodes.

Note: I have used the same options given by Tim while running pallas,
mpi-ping and my small test mprogram.

# mpirun -np 2 -mca pml ob1 -mca btl_base_include self,mvapi -mca
btl_base_debug 1 ./a.out

I have run mpi-ping.c file which is attached in the file given by OMPI
Developer. This program hangs. I have run pallas (only pingpong) in
inter-node case, it hangs too.

Attached zip file contains the following files

Test_out.txt --> Works fine in inter-node case. Send/recv only one char.
mpi_ping.txt --> Hangs in inter-node case. I need to press ctrl+C
Pmb_out.txt --> Hangs in inter-node case. Just ran pingpong. I need to
press ctrl+C
Test.c ---> my small MPI program

The debug info is there in the above .txt files. Tim might be interested
to look at the debug output.

I have run pallas in intra-node case (same machine) and it hangs in
intra-node case too. This output is something similar to pmb_out.txt
except the IP address and port number.

# mpirun -np 2 -mca pml ob1 -mca btl_base_include self,mvapi -mca
btl_base_debug 1 ./PMB-MPI1

But when I run without any options, it runs fine.

#mpirun -np 2 ./PMB-MPI1

Thanks
-Sridhar

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: Wednesday, August 17, 2005 6:19 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

On Aug 17, 2005, at 8:23 AM, Sridhar Chirravuri wrote:

> Can someone reply to my mail please?

I think you sent your first mail at 6:48am in my time zone (that is
4:48am Los Alamos time -- I strongly doubt that they are at work
yet...); I'm still processing my mail from last night and am just now
seeing your mail.

Global software development is challenging. :-)

> I checked out the latest code drop r6911 today morning and ran Pallas
> with in the same node (2 procs). It ran fine. I didn't see any hangs
> this time whereas I could see the following statements in the pallas
> output and I feel they are just warnings, which can be ignored. Am I
> correct?
>
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)

Hum. I was under the impression that George had fixed these, but I get
the same warnings. I'll have a look...

> Here is the output of sample MPI program which sends a char and recvs
a
> char.
>
> [root_at_micrompi-1 ~]# mpirun -np 2 ./a.out
> Could not join a running, existing universe
> Establishing a new one named: default-universe-12913
> [0,0,0] mca_oob_tcp_init: calling orte_gpr.subscribe
> [0,0,0] mca_oob_tcp_init: calling orte_gpr.put(orte-job-0)
> [snipped]
> [0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2
> [0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2

This seems to be a *lot* of debugging output -- did you enable that on
purpose? I don't get the majority of that output when I run a hello
world or a ring MPI program (I only get the bit about the existing
universe).

> My configure command looks like
>
> ./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
> --enable-mca-no-build=btl-openib,pml-teg,pml-uniq
>
> Since I am working with mvapi component, I disabled openib.

Note that you can disable these things at run-time; you don't have to
disable it at configure time. I only mention this for completeness --
either way, it's disabled.

> But I could see that data is going over TCP/GigE and not on
Infiniband.

Tim: what's the status of multi-rail stuff? I thought I saw a commit
recently where the TCP BTL would automatically disable itself if it saw
that one or more of the low-latency BTLs was available...?

Sridhar: Did you try running explicitly requesting mvapi? Perhaps
something like:

        mpirun --mca btl mvapi,self ....

This shouldn't be necessary -- mvapi should select itself automatically
-- but perhaps something is going wrong with the mvapi selection
sequence...? Tim/Galen -- got any insight here?

> I have run pallas, it simply hangs again :-(

I'm confused -- above, you said that you ran pallas and it worked
fine...?

(it does not hang for me when I run with teg or ob1)

> Note: I added pml=ob1 in the conf file
> /openmpi/etc/openmpi-mca-params.conf
>
> Any latest options being added to the configure command? Please let me
> know.

No, nothing changed there AFAIK.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel