Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Tim S. Woodall (twoodall_at_[hidden])
Date: 2005-08-10 08:46:20


Hello Sridhar,

Can you try running the mpi-ping program again with:

export OMPI_MCA_oob_tcp_debug=10
orterun -np 2 ./mpi-ping

I'm thinking there may be a problem setting up an OOB connection
between the backend/frontend node.

Tim

Sridhar Chirravuri wrote:
> Hi,
>
> I got the latest code drop of 6791 today morning.
>
> I have removed .ompi_ignore and .ompi_unignore files from
> ompi/mca/mpool/mvapi directory. If I don't remove and build, the MPI
> program fails with signal 11. After removing those hidden files from
> that directory and building, signal 11 error disappeared.
>
> I have configured with the options given by Galen.
>
> ./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
> --enable-mca-no-build=btl-openib,pml-teg,pml-uniq
>
> After make all install, I have run pallas but I got the same error
> messages (please see down below for error messages). I have run 3-4
> times, sometimes I didn't get any output but pallas just hungs. I have
> run pingpong only. I have run pallas (all functions including reduce),
> but got the following messages in intra-node case.
>
> Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
> Request for 0 bytes (coll_basic_reduce.c, 193)
> Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
> Request for 0 bytes (coll_basic_reduce.c, 193)
>
> Since these types of messages seen by George, upcoming patch might
> resolve this issue.
>
> Also, I have run mpi-ping.c program given by Galen with the latest code
> drop and it just hung. Here is the output
>
> [root_at_micrompi-1 ~]# mpirun -np 2 ./a.out -r 10 0 100000 1000
> Could not join a running, existing universe
> Establishing a new one named: default-universe-12461
> mpi-ping: ping-pong
> nprocs=2, reps=10, min bytes=0, max bytes=100000 inc bytes=1000
> 0 pings 1
>
>
> ... I just did ctrl+c here after 10 mins ...
>
> 2 processes killed (possibly by Open MPI)
>
> I have no clue whether the George patch will fix this problem or not.
>
> Before running mpi-ping program, I have export OMPI_MCA_btl_base_debug=2
> in my shell.
>
> Thanks
> -Sridhar
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
> Behalf Of Galen Shipman
> Sent: Tuesday, August 09, 2005 11:10 PM
> To: Open MPI Developers
> Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
>
> Hi
> On Aug 9, 2005, at 8:15 AM, Sridhar Chirravuri wrote:
>
>
>>The same kind of output while running Pallas "pingpong" test.
>>
>>-Sridhar
>>
>>-----Original Message-----
>>From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>
> On
>
>>Behalf Of Sridhar Chirravuri
>>Sent: Tuesday, August 09, 2005 7:44 PM
>>To: Open MPI Developers
>>Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
>>
>>
>>I have run sendrecv function in Pallas but it failed to run it. Here
>
> is
>
>>the output
>>
>>[root_at_micrompi-2 SRC_PMB]# mpirun -np 2 PMB-MPI1 sendrecv
>>Could not join a running, existing universe
>>Establishing a new one named: default-universe-5097
>>[0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>>[0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>>
>>
>>[0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>>
>>[0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>>
>>[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send]
>>Connection
>>to endpoint closed ... connecting ...
>>[0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
>>Initialized High Priority QP num = 263177, Low Priority QP num =
>>263178,
>>LID = 785
>>
>>[0,1,0][btl_mvapi_endpoint.c:190:
>>mca_btl_mvapi_endpoint_send_connect_req
>>] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
>>LID = 785[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send]
>>Connection to endpoint closed ... connecting ...
>>[0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
>>Initialized High Priority QP num = 263179, Low Priority QP num =
>>263180,
>>LID = 786
>>
>>[0,1,0][btl_mvapi_endpoint.c:190:
>>mca_btl_mvapi_endpoint_send_connect_req
>>] Sending High Priority QP num = 263179, Low Priority QP num = 263180,
>>LID = 786#---------------------------------------------------
>># PALLAS MPI Benchmark Suite V2.2, MPI-1 part
>>#---------------------------------------------------
>># Date : Tue Aug 9 07:11:25 2005
>># Machine : x86_64# System : Linux
>># Release : 2.6.9-5.ELsmp
>># Version : #1 SMP Wed Jan 5 19:29:47 EST 2005
>>
>>#
>># Minimum message length in bytes: 0
>># Maximum message length in bytes: 4194304
>>#
>># MPI_Datatype : MPI_BYTE
>># MPI_Datatype for reductions : MPI_FLOAT
>># MPI_Op : MPI_SUM
>>#
>>#
>>
>># List of Benchmarks to run:
>>
>># Sendrecv
>>[0,1,1][btl_mvapi_endpoint.c:368:
>>mca_btl_mvapi_endpoint_reply_start_conn
>>ect] Initialized High Priority QP num = 263177, Low Priority QP num =
>>263178, LID = 777
>>
>>[0,1,1][btl_mvapi_endpoint.c:266:
>>mca_btl_mvapi_endpoint_set_remote_info]
>>Received High Priority QP num = 263177, Low Priority QP num 263178,
>>LID
>>= 785
>>
>>[0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to init..Qp
>>7080096[0,1,1][btl_mvapi_endpoint.c:791:
>>mca_btl_mvapi_endpoint_qp_init_q
>>uery] Modified to RTR..Qp
>>7080096[0,1,1][btl_mvapi_endpoint.c:814:
>>mca_btl_mvapi_endpoint_qp_init_q
>>uery] Modified to RTS..Qp 7080096
>>
>>[0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to init..Qp 7240736
>>[0,1,1][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to RTR..Qp
>>7240736[0,1,1][btl_mvapi_endpoint.c:814:
>>mca_btl_mvapi_endpoint_qp_init_q
>>uery] Modified to RTS..Qp 7240736
>>[0,1,1][btl_mvapi_endpoint.c:190:
>>mca_btl_mvapi_endpoint_send_connect_req
>>] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
>>LID = 777
>>[0,1,0][btl_mvapi_endpoint.c:266:
>>mca_btl_mvapi_endpoint_set_remote_info]
>>Received High Priority QP num = 263177, Low Priority QP num 263178,
>>LID
>>= 777
>>[0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to init..Qp 7081440
>>[0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to RTR..Qp 7081440
>>[0,1,0][btl_mvapi_endpoint.c:814:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to RTS..Qp 7081440
>>[0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to init..Qp 7241888
>>[0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
>>Modified to RTR..Qp
>>7241888[0,1,0][btl_mvapi_endpoint.c:814:
>>mca_btl_mvapi_endpoint_qp_init_q
>>uery] Modified to RTS..Qp 7241888
>>[0,1,1][btl_mvapi_component.c:523:mca_btl_mvapi_component_progress]
>
> Got
>
>>a recv completion
>>
>>
>>Thanks
>>-Sridhar
>>
>>
>>
>>
>>-----Original Message-----
>>From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>
> On
>
>>Behalf Of Brian Barrett
>>Sent: Tuesday, August 09, 2005 7:35 PM
>>To: Open MPI Developers
>>Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
>>
>>On Aug 9, 2005, at 8:48 AM, Sridhar Chirravuri wrote:
>>
>>
>>>Does r6774 has lot of changes that are related to 3rd generation
>>>point-to-point? I am trying to run some benchmark tests (ex:
>>>pallas) with Open MPI stack and just want to compare the
>>>performance figures with MVAPICH 095 and MVAPICH 092.
>>>
>>>In order to use 3rd generation p2p communication, I have added the
>>>following line in the /openmpi/etc/openmpi-mca-params.conf
>>>
>>>pml=ob1
>>>
>>>I also exported (as double check) OMPI_MCA_pml=ob1.
>>>
>>>Then, I have tried running on the same machine. My machine has got
>>>2 processors.
>>>
>>>Mpirun -np 2 ./PMB-MPI1
>>>
>>>I still see the following lines
>>>
>>>Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
>>>Request for 0 bytes (coll_basic_reduce.c, 193)
>>>Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
>>>Request for 0 bytes (coll_basic_reduce.c, 193)
>>
>>These errors are coming from the collective routines, not the PML/BTL
>>layers. It looks like the reduction codes are trying to call malloc
>>(0), which doesn't work so well. We'll take a look as soon as we
>>can. In the mean time, can you just not run the tests that call the
>>reduction collectives?
>>
>>Brian
>>
>>
>>--
>> Brian Barrett
>> Open MPI developer
>> http://www.open-mpi.org/
>>
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>