Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slow MPI_BCast for messages size from 24K bytes to 800K bytes.
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-01-12 12:57:07


You might want to do some "warmup" bcasts before doing your timing
measurements.

Open MPI makes network connections lazily, meaning that we only make
connections upon the first send (e.g., the sends underneath the
MPI_BCAST). So the first MPI_BCAST is likely to be quite slow, while
all the IB network connections are being made. Subsequent bcasts are
likely to be much faster.

On Jan 9, 2009, at 8:47 PM, kmuriki_at_[hidden] wrote:

>
> Hello there,
>
> We have a DDR IB cluster with Open MPI ver 1.2.8.
> I'm testing on two nodes with two processors each and both
> the nodes are adjacent (2 hops distant) on the same leaf
> of the tree interconnect.
>
> I observe that when I try to MPI_BCAST among the four MPI
> tasks it takes a lot of time with IB network (more than
> the GiGE network) when the payload sizes range from 24K bytes
> to 800K bytes.
>
> For payloads below 8K bytes and above 200K bytes the performance
> is acceptable.
>
> Any suggestions on how I debug this and locate the source of
> the problem ? (More info below) Please let me know if you need
> any more information from my side.
>
> thanks for your time,
> Krishna Muriki,
> HPC User Services,
> Scientific Cluster Support,
> Lawrence Berkeley National Laboratory.
>
> I) Payload size 8M bytes over IB:
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl openib,self -
> np 4 -hostfile hostfile.lr ./testbcast.8000000
> [n0005.scs00:13902] Map for job: 1 Generated by mapping mode:
> byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.8000000
> Num procs: 4
> Argv[0]: ./testbcast.8000000
> Env[0]: OMPI_MCA_btl=openib,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports
> =1405b3b501aa4086-00dbc7151c7348e1
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir: /global/home/users/kmuriki/
> sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 3
> About to call broadcast 1
> About to call broadcast 2
> About to call broadcast 0
> Done with call to broadcast 2
> time for bcast 0.133496046066284
> Done with call to broadcast 3
> time for bcast 0.148098945617676
> Done with call to broadcast 0
> time for bcast 0.113168954849243
> Done with call to broadcast 1
> time for bcast 0.145189046859741
> [kmuriki_at_n0005 pub]$
>
>
> II) Payload size 80K bytes using GiGE Network:
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl tcp,self -np 4
> -hostfile hostfile.lr ./testbcast.80000
> [n0005.scs00:13928] Map for job: 1 Generated by mapping mode:
> byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.80000
> Num procs: 4
> Argv[0]: ./testbcast.80000
> Env[0]: OMPI_MCA_btl=tcp,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports
> =305b93d4acc82685-12bbf20d2e6d250b
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir: /global/home/users/kmuriki/
> sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 0
> About to call broadcast 2
> About to call broadcast 1
> Done with call to broadcast 2
> time for bcast 7.137393951416016E-002
> About to call broadcast 3
> Done with call to broadcast 3
> time for bcast 1.110005378723145E-002
> Done with call to broadcast 0
> time for bcast 7.121706008911133E-002
> Done with call to broadcast 1
> time for bcast 3.379988670349121E-002
> [kmuriki_at_n0005 pub]$
>
> III) Payload size 80K bytes using IB Network:
>
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl openib,self -
> np 4 -hostfile hostfile.lr ./testbcast.80000
> [n0005.scs00:13941] Map for job: 1 Generated by mapping mode:
> byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.80000
> Num procs: 4
> Argv[0]: ./testbcast.80000
> Env[0]: OMPI_MCA_btl=openib,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports
> =4cdb5ae2babe9010-709842ac574605f9
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir: /global/home/users/kmuriki/
> sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value:
> NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 0
> About to call broadcast 3
> About to call broadcast 1
> Done with call to broadcast 1
> time for bcast 2.550005912780762E-002
> About to call broadcast 2
> Done with call to broadcast 2
> time for bcast 2.154898643493652E-002
> Done with call to broadcast 3
> Done with call to broadcast 0
> time for bcast 38.1956140995026
> time for bcast 38.2115209102631
> [kmuriki_at_n0005 pub]$
>
> Finally here is the fortran code I'm playing with and I'm modifying
> the
> payload size by changing the value of the variable 'ndat':
>
> [kmuriki_at_n0005 pub]$ more testbcast.f90
> program em3d
> implicit real*8 (a-h,o-z)
> include 'mpif.h'
> ! em3d_inv main driver
> ! INITIALIZE MPI AND DETERMINE BOTH INDIVIDUAL PROCESSOR #
> ! AND THE TOTAL NUMBER OF PROCESSORS
> !
> integer:: Proc
> real*8, allocatable:: dbuf(:)
>
> call MPI_INIT(ierror)
> call MPI_COMM_RANK(MPI_COMM_WORLD,Proc,IERROR)
> call MPI_COMM_SIZE(MPI_COMM_WORLD,Num_Proc,IERROR)
>
> ndat=1000000
>
> !print*,'bcasting to no of tasks',num_proc
> allocate(dbuf(ndat))
> do i=1,ndat
> dbuf(i)=dble(i)
> enddo
>
> print*, 'About to call broadcast',proc
> t1=MPI_WTIME()
> call MPI_BCAST(dbuf,ndat, &
> MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierror)
> print*, 'Done with call to broadcast',proc
> t2=MPI_WTIME()
> write(*,*)'time for bcast',t2-t1
>
> deallocate(dbuf)
> call MPI_FINALIZE(IERROR)
> end program em3d
> [kmuriki_at_n0005 pub]$
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems