Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Sridhar Chirravuri (sridhar_at_[hidden])
Date: 2005-08-18 05:20:50


Hi,

Thanks for the info about IMB. I will download the latest one.

Pallas was running fine in intra-node case. But it is hanging in
inter-node case.

I have a small MPI program which send/recv a char. I have tested this
program across the nodes (inter-node) as follows. It ran fine across the
nodes.

Note: I have used the same options given by Tim while running pallas,
mpi-ping and my small test mprogram.

# mpirun -np 2 -mca pml ob1 -mca btl_base_include self,mvapi -mca
btl_base_debug 1 ./a.out

I have run mpi-ping.c file which is attached in the file given by OMPI
Developer. This program hangs. I have run pallas (only pingpong) in
inter-node case, it hangs too.

Attached zip file contains the following files

Test_out.txt --> Works fine in inter-node case. Send/recv only one char.
mpi_ping.txt --> Hangs in inter-node case. I need to press ctrl+C
Pmb_out.txt --> Hangs in inter-node case. Just ran pingpong. I need to
press ctrl+C
Test.c ---> my small MPI program

The debug info is there in the above .txt files. Tim might be interested
to look at the debug output.

I have run pallas in intra-node case (same machine) and it hangs in
intra-node case too. This output is something similar to pmb_out.txt
except the IP address and port number.

# mpirun -np 2 -mca pml ob1 -mca btl_base_include self,mvapi -mca
btl_base_debug 1 ./PMB-MPI1

But when I run without any options, it runs fine.

#mpirun -np 2 ./PMB-MPI1

Thanks
-Sridhar

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Jeff Squyres
Sent: Wednesday, August 17, 2005 6:19 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

On Aug 17, 2005, at 8:23 AM, Sridhar Chirravuri wrote:

> Can someone reply to my mail please?

I think you sent your first mail at 6:48am in my time zone (that is
4:48am Los Alamos time -- I strongly doubt that they are at work
yet...); I'm still processing my mail from last night and am just now
seeing your mail.

Global software development is challenging. :-)

> I checked out the latest code drop r6911 today morning and ran Pallas
> with in the same node (2 procs). It ran fine. I didn't see any hangs
> this time whereas I could see the following statements in the pallas
> output and I feel they are just warnings, which can be ignored. Am I
> correct?
>
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)
> Request for 0 bytes (coll_basic_reduce_scatter.c, 80)
> Request for 0 bytes (coll_basic_reduce.c, 194)

Hum. I was under the impression that George had fixed these, but I get
the same warnings. I'll have a look...

> Here is the output of sample MPI program which sends a char and recvs
a
> char.
>
> [root_at_micrompi-1 ~]# mpirun -np 2 ./a.out
> Could not join a running, existing universe
> Establishing a new one named: default-universe-12913
> [0,0,0] mca_oob_tcp_init: calling orte_gpr.subscribe
> [0,0,0] mca_oob_tcp_init: calling orte_gpr.put(orte-job-0)
> [snipped]
> [0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2
> [0,0,0]-[0,0,1] mca_oob_tcp_send: tag 2

This seems to be a *lot* of debugging output -- did you enable that on
purpose? I don't get the majority of that output when I run a hello
world or a ring MPI program (I only get the bit about the existing
universe).

> My configure command looks like
>
> ./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
> --enable-mca-no-build=btl-openib,pml-teg,pml-uniq
>
> Since I am working with mvapi component, I disabled openib.

Note that you can disable these things at run-time; you don't have to
disable it at configure time. I only mention this for completeness --
either way, it's disabled.

> But I could see that data is going over TCP/GigE and not on
Infiniband.

Tim: what's the status of multi-rail stuff? I thought I saw a commit
recently where the TCP BTL would automatically disable itself if it saw
that one or more of the low-latency BTLs was available...?

Sridhar: Did you try running explicitly requesting mvapi? Perhaps
something like:

        mpirun --mca btl mvapi,self ....

This shouldn't be necessary -- mvapi should select itself automatically
-- but perhaps something is going wrong with the mvapi selection
sequence...? Tim/Galen -- got any insight here?

> I have run pallas, it simply hangs again :-(

I'm confused -- above, you said that you ran pallas and it worked
fine...?

(it does not hang for me when I run with teg or ob1)

> Note: I added pml=ob1 in the conf file
> /openmpi/etc/openmpi-mca-params.conf
>
> Any latest options being added to the configure command? Please let me
> know.

No, nothing changed there AFAIK.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel