Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] slow MPI_BCast for messages size from 24K bytes to 800K bytes.
From: David Prendergast (dgprendergast_at_[hidden])
Date: 2009-01-11 20:40:00


Hey Krishna,
Is this part of the reason that our users are seeing a significant
slowdown when they go beyond using 2 nodes with espresso? You should try
that as an example. It's surprising that using more than 2 nodes can
lead to a slower wall time for calculations than using 2 alone.
David

KMuriki_at_[hidden] wrote:
>
> Hello there,
>
> We have a DDR IB cluster with Open MPI ver 1.2.8.
> I'm testing on two nodes with two processors each and both
> the nodes are adjacent (2 hops distant) on the same leaf
> of the tree interconnect.
>
> I observe that when I try to MPI_BCAST among the four MPI
> tasks it takes a lot of time with IB network (more than
> the GiGE network) when the payload sizes range from 24K bytes
> to 800K bytes.
>
> For payloads below 8K bytes and above 200K bytes the performance
> is acceptable.
>
> Any suggestions on how I debug this and locate the source of
> the problem ? (More info below) Please let me know if you need
> any more information from my side.
>
> thanks for your time,
> Krishna Muriki,
> HPC User Services,
> Scientific Cluster Support,
> Lawrence Berkeley National Laboratory.
>
> I) Payload size 8M bytes over IB:
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np
> 4 -hostfile hostfile.lr ./testbcast.8000000
> [n0005.scs00:13902] Map for job: 1 Generated by mapping mode: byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.8000000
> Num procs: 4
> Argv[0]: ./testbcast.8000000
> Env[0]: OMPI_MCA_btl=openib,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports=1405b3b501aa4086-00dbc7151c7348e1
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir:
> /global/home/users/kmuriki/sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 3
> About to call broadcast 1
> About to call broadcast 2
> About to call broadcast 0
> Done with call to broadcast 2
> time for bcast 0.133496046066284
> Done with call to broadcast 3
> time for bcast 0.148098945617676
> Done with call to broadcast 0
> time for bcast 0.113168954849243
> Done with call to broadcast 1
> time for bcast 0.145189046859741
> [kmuriki_at_n0005 pub]$
>
>
> II) Payload size 80K bytes using GiGE Network:
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl tcp,self -np 4
> -hostfile hostfile.lr ./testbcast.80000
> [n0005.scs00:13928] Map for job: 1 Generated by mapping mode: byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.80000
> Num procs: 4
> Argv[0]: ./testbcast.80000
> Env[0]: OMPI_MCA_btl=tcp,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports=305b93d4acc82685-12bbf20d2e6d250b
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir:
> /global/home/users/kmuriki/sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 0
> About to call broadcast 2
> About to call broadcast 1
> Done with call to broadcast 2
> time for bcast 7.137393951416016E-002
> About to call broadcast 3
> Done with call to broadcast 3
> time for bcast 1.110005378723145E-002
> Done with call to broadcast 0
> time for bcast 7.121706008911133E-002
> Done with call to broadcast 1
> time for bcast 3.379988670349121E-002
> [kmuriki_at_n0005 pub]$
>
> III) Payload size 80K bytes using IB Network:
>
>
> [kmuriki_at_n0005 pub]$ mpirun -v -display-map --mca btl openib,self -np
> 4 -hostfile hostfile.lr ./testbcast.80000
> [n0005.scs00:13941] Map for job: 1 Generated by mapping mode: byslot
> Starting vpid: 0 Vpid range: 4 Num app_contexts: 1
> Data for app_context: index 0 app: ./testbcast.80000
> Num procs: 4
> Argv[0]: ./testbcast.80000
> Env[0]: OMPI_MCA_btl=openib,self
> Env[1]: OMPI_MCA_rmaps_base_display_map=1
> Env[2]: OMPI_MCA_rds_hostfile_path=hostfile.lr
> Env[3]:
> OMPI_MCA_orte_precondition_transports=4cdb5ae2babe9010-709842ac574605f9
> Env[4]: OMPI_MCA_rds=proxy
> Env[5]: OMPI_MCA_ras=proxy
> Env[6]: OMPI_MCA_rmaps=proxy
> Env[7]: OMPI_MCA_pls=proxy
> Env[8]: OMPI_MCA_rmgr=proxy
> Working dir:
> /global/home/users/kmuriki/sample_executables/pub (user: 0)
> Num maps: 0
> Num elements in nodes list: 2
> Mapped node:
> Cell: 0 Nodename: n0172.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,0]
> Proc Rank: 0 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,1]
> Proc Rank: 1 Proc PID: 0 App_context
> index: 0
>
> Mapped node:
> Cell: 0 Nodename: n0173.lr Launch id: -1
> Username: NULL
> Daemon name:
> Data type: ORTE_PROCESS_NAME Data Value: NULL
> Oversubscribed: False Num elements in procs list: 2
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,2]
> Proc Rank: 2 Proc PID: 0 App_context
> index: 0
>
> Mapped proc:
> Proc Name:
> Data type: ORTE_PROCESS_NAME Data Value:
> [0,1,3]
> Proc Rank: 3 Proc PID: 0 App_context
> index: 0
> About to call broadcast 0
> About to call broadcast 3
> About to call broadcast 1
> Done with call to broadcast 1
> time for bcast 2.550005912780762E-002
> About to call broadcast 2
> Done with call to broadcast 2
> time for bcast 2.154898643493652E-002
> Done with call to broadcast 3
> Done with call to broadcast 0
> time for bcast 38.1956140995026
> time for bcast 38.2115209102631
> [kmuriki_at_n0005 pub]$
>
> Finally here is the fortran code I'm playing with and I'm modifying the
> payload size by changing the value of the variable 'ndat':
>
> [kmuriki_at_n0005 pub]$ more testbcast.f90
> program em3d
> implicit real*8 (a-h,o-z)
> include 'mpif.h'
> ! em3d_inv main driver
> ! INITIALIZE MPI AND DETERMINE BOTH INDIVIDUAL PROCESSOR #
> ! AND THE TOTAL NUMBER OF PROCESSORS
> !
> integer:: Proc
> real*8, allocatable:: dbuf(:)
>
> call MPI_INIT(ierror)
> call MPI_COMM_RANK(MPI_COMM_WORLD,Proc,IERROR)
> call MPI_COMM_SIZE(MPI_COMM_WORLD,Num_Proc,IERROR)
>
> ndat=1000000
>
> !print*,'bcasting to no of tasks',num_proc
> allocate(dbuf(ndat))
> do i=1,ndat
> dbuf(i)=dble(i)
> enddo
>
> print*, 'About to call broadcast',proc
> t1=MPI_WTIME()
> call MPI_BCAST(dbuf,ndat, &
> MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierror)
> print*, 'Done with call to broadcast',proc
> t2=MPI_WTIME()
> write(*,*)'time for bcast',t2-t1
>
> deallocate(dbuf)
> call MPI_FINALIZE(IERROR)
> end program em3d
> [kmuriki_at_n0005 pub]$
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
OoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo
  David Prendergast
  Lawrence Berkeley National Laboratory
  Molecular Foundry                                phone: (510) 486-4948
  1 Cyclotron Rd., MS 67-3207                      fax:   (510) 486-7424
  Berkeley, CA 94720                        email: dgprendergast_at_[hidden]
  USA             web: http://nanotheory.lbl.gov/people/prendergast.html
OoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOoOo