Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Sridhar Chirravuri (sridhar_at_[hidden])
Date: 2005-08-10 05:28:26


Hi,

I got the latest code drop of 6791 today morning.

I have removed .ompi_ignore and .ompi_unignore files from
ompi/mca/mpool/mvapi directory. If I don't remove and build, the MPI
program fails with signal 11. After removing those hidden files from
that directory and building, signal 11 error disappeared.

I have configured with the options given by Galen.

./configure --prefix=/openmpi --with-btl-mvapi=/usr/local/topspin/
--enable-mca-no-build=btl-openib,pml-teg,pml-uniq

After make all install, I have run pallas but I got the same error
messages (please see down below for error messages). I have run 3-4
times, sometimes I didn't get any output but pallas just hungs. I have
run pingpong only. I have run pallas (all functions including reduce),
but got the following messages in intra-node case.

Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)
Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
Request for 0 bytes (coll_basic_reduce.c, 193)

Since these types of messages seen by George, upcoming patch might
resolve this issue.

Also, I have run mpi-ping.c program given by Galen with the latest code
drop and it just hung. Here is the output

[root_at_micrompi-1 ~]# mpirun -np 2 ./a.out -r 10 0 100000 1000
Could not join a running, existing universe
Establishing a new one named: default-universe-12461
mpi-ping: ping-pong
nprocs=2, reps=10, min bytes=0, max bytes=100000 inc bytes=1000
0 pings 1

... I just did ctrl+c here after 10 mins ...

2 processes killed (possibly by Open MPI)

I have no clue whether the George patch will fix this problem or not.

Before running mpi-ping program, I have export OMPI_MCA_btl_base_debug=2
in my shell.

Thanks
-Sridhar

-----Original Message-----
From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] On
Behalf Of Galen Shipman
Sent: Tuesday, August 09, 2005 11:10 PM
To: Open MPI Developers
Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI

Hi
On Aug 9, 2005, at 8:15 AM, Sridhar Chirravuri wrote:

> The same kind of output while running Pallas "pingpong" test.
>
> -Sridhar
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> Behalf Of Sridhar Chirravuri
> Sent: Tuesday, August 09, 2005 7:44 PM
> To: Open MPI Developers
> Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
>
>
> I have run sendrecv function in Pallas but it failed to run it. Here
is
> the output
>
> [root_at_micrompi-2 SRC_PMB]# mpirun -np 2 PMB-MPI1 sendrecv
> Could not join a running, existing universe
> Establishing a new one named: default-universe-5097
> [0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
> [0,1,1][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>
>
> [0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>
> [0,1,0][btl_mvapi.c:130:mca_btl_mvapi_del_procs] Stub
>
> [0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send]
> Connection
> to endpoint closed ... connecting ...
> [0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
> Initialized High Priority QP num = 263177, Low Priority QP num =
> 263178,
> LID = 785
>
> [0,1,0][btl_mvapi_endpoint.c:190:
> mca_btl_mvapi_endpoint_send_connect_req
> ] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
> LID = 785[0,1,0][btl_mvapi_endpoint.c:542:mca_btl_mvapi_endpoint_send]
> Connection to endpoint closed ... connecting ...
> [0,1,0][btl_mvapi_endpoint.c:318:mca_btl_mvapi_endpoint_start_connect]
> Initialized High Priority QP num = 263179, Low Priority QP num =
> 263180,
> LID = 786
>
> [0,1,0][btl_mvapi_endpoint.c:190:
> mca_btl_mvapi_endpoint_send_connect_req
> ] Sending High Priority QP num = 263179, Low Priority QP num = 263180,
> LID = 786#---------------------------------------------------
> # PALLAS MPI Benchmark Suite V2.2, MPI-1 part
> #---------------------------------------------------
> # Date : Tue Aug 9 07:11:25 2005
> # Machine : x86_64# System : Linux
> # Release : 2.6.9-5.ELsmp
> # Version : #1 SMP Wed Jan 5 19:29:47 EST 2005
>
> #
> # Minimum message length in bytes: 0
> # Maximum message length in bytes: 4194304
> #
> # MPI_Datatype : MPI_BYTE
> # MPI_Datatype for reductions : MPI_FLOAT
> # MPI_Op : MPI_SUM
> #
> #
>
> # List of Benchmarks to run:
>
> # Sendrecv
> [0,1,1][btl_mvapi_endpoint.c:368:
> mca_btl_mvapi_endpoint_reply_start_conn
> ect] Initialized High Priority QP num = 263177, Low Priority QP num =
> 263178, LID = 777
>
> [0,1,1][btl_mvapi_endpoint.c:266:
> mca_btl_mvapi_endpoint_set_remote_info]
> Received High Priority QP num = 263177, Low Priority QP num 263178,
> LID
> = 785
>
> [0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to init..Qp
> 7080096[0,1,1][btl_mvapi_endpoint.c:791:
> mca_btl_mvapi_endpoint_qp_init_q
> uery] Modified to RTR..Qp
> 7080096[0,1,1][btl_mvapi_endpoint.c:814:
> mca_btl_mvapi_endpoint_qp_init_q
> uery] Modified to RTS..Qp 7080096
>
> [0,1,1][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to init..Qp 7240736
> [0,1,1][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to RTR..Qp
> 7240736[0,1,1][btl_mvapi_endpoint.c:814:
> mca_btl_mvapi_endpoint_qp_init_q
> uery] Modified to RTS..Qp 7240736
> [0,1,1][btl_mvapi_endpoint.c:190:
> mca_btl_mvapi_endpoint_send_connect_req
> ] Sending High Priority QP num = 263177, Low Priority QP num = 263178,
> LID = 777
> [0,1,0][btl_mvapi_endpoint.c:266:
> mca_btl_mvapi_endpoint_set_remote_info]
> Received High Priority QP num = 263177, Low Priority QP num 263178,
> LID
> = 777
> [0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to init..Qp 7081440
> [0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to RTR..Qp 7081440
> [0,1,0][btl_mvapi_endpoint.c:814:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to RTS..Qp 7081440
> [0,1,0][btl_mvapi_endpoint.c:756:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to init..Qp 7241888
> [0,1,0][btl_mvapi_endpoint.c:791:mca_btl_mvapi_endpoint_qp_init_query]
> Modified to RTR..Qp
> 7241888[0,1,0][btl_mvapi_endpoint.c:814:
> mca_btl_mvapi_endpoint_qp_init_q
> uery] Modified to RTS..Qp 7241888
> [0,1,1][btl_mvapi_component.c:523:mca_btl_mvapi_component_progress]
Got
> a recv completion
>
>
> Thanks
> -Sridhar
>
>
>
>
> -----Original Message-----
> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
On
> Behalf Of Brian Barrett
> Sent: Tuesday, August 09, 2005 7:35 PM
> To: Open MPI Developers
> Subject: Re: [O-MPI devel] Fwd: Regarding MVAPI Component in Open MPI
>
> On Aug 9, 2005, at 8:48 AM, Sridhar Chirravuri wrote:
>
>> Does r6774 has lot of changes that are related to 3rd generation
>> point-to-point? I am trying to run some benchmark tests (ex:
>> pallas) with Open MPI stack and just want to compare the
>> performance figures with MVAPICH 095 and MVAPICH 092.
>>
>> In order to use 3rd generation p2p communication, I have added the
>> following line in the /openmpi/etc/openmpi-mca-params.conf
>>
>> pml=ob1
>>
>> I also exported (as double check) OMPI_MCA_pml=ob1.
>>
>> Then, I have tried running on the same machine. My machine has got
>> 2 processors.
>>
>> Mpirun -np 2 ./PMB-MPI1
>>
>> I still see the following lines
>>
>> Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
>> Request for 0 bytes (coll_basic_reduce.c, 193)
>> Request for 0 bytes (coll_basic_reduce_scatter.c, 79)
>> Request for 0 bytes (coll_basic_reduce.c, 193)
>
> These errors are coming from the collective routines, not the PML/BTL
> layers. It looks like the reduction codes are trying to call malloc
> (0), which doesn't work so well. We'll take a look as soon as we
> can. In the mean time, can you just not run the tests that call the
> reduction collectives?
>
> Brian
>
>
> --
> Brian Barrett
> Open MPI developer
> http://www.open-mpi.org/
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel