I missed the statement that all works when you add sleeps. That probably
rules out any possible error in the way MPI_Bcast was used.
Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
Randolph Pullen <randolph_pullen_at_[hidden]>
08/07/2010 01:23 AM
[OMPI users] MPI_Bcast issue
I seem to be having a problem with MPI_Bcast.
My massive I/O intensive data movement program must broadcast from n to n
nodes. My problem starts because I require 2 processes per node, a sender
and a receiver and I have implemented these using MPI processes rather
than tackle the complexities of threads on MPI.
Consequently, broadcast and calls like alltoall are not completely
helpful. The dataset is huge and each node must end up with a complete
copy built by the large number of contributing broadcasts from the sending
nodes. Network efficiency and run time are paramount.
As I donât want to needlessly broadcast all this data to the sending nodes
and I have a perfectly good MPI program that distributes globally from a
single node (1 to N), I took the unusual decision to start N copies of
this program by spawning the MPI system from the PVM system in an effort
to get my N to N concurrent transfers.
It seems that the broadcasts running on concurrent MPI environments
collide and cause all but the first process to hang waiting for their
broadcasts. This theory seems to be confirmed by introducing a sleep of
n-1 seconds before the first MPI_Bcast call on each node, which results
in the code working perfectly. (total run time 55 seconds, 3 nodes,
standard TCP stack)
My guess is that unlike PVM, OpenMPI implements broadcasts with broadcasts
rather than multicasts. Can someone confirm this? Is this a bug?
Is there any multicast or N to N broadcast where sender processes can
avoid participating when they donât need to?
Thanks in advance
users mailing list