Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Bcast vs. per worker MPI_Send?
From: David Mathog (mathog_at_[hidden])
Date: 2010-12-14 11:54:10


So the 2/2 consensus is to use the collective. That is straightforward
for the send part of this, since all workers are sent the same data.

For the receive I do not see how to use a collective. Each worker sends
back a data structure, and the structures are of of varying size. This
is almost always the case in Bioinformatics, where what is usually
coming back from each worker is a count M of the number of significant
results, M x (fixed size data per result: scores and the like), and M x
sequences or sequence alignments. M runs from 0 to Z, where in
pathological cases, Z is a very large number, and the size of the
sequences or alignments returned also varies.

The current code on the master does within a loop over the N workers:

  MPI_probe
  MPI_Get_Count
  MPI_Receive
  unpack received data into a result structure
  set a pointer in an array of length N to this result

So MPI_gather isn't going to do. Possibly MPI_gatherv would, but we
cannot know ahead of time how big the largest result is going to be,
which makes preallocating memory difficult.

Is there by any chance an "MPI_Get_Counts" (a collective form of
MPI_Get_Count)? That would let the preceding loop be replaced by

  MPI_Get_Counts
  (allocate memory as needed)
  MPI_Gatherv

although I guess even that wouldn't be very efficient with memory,
because there would usually be huge holes in the recv buffer.

Thanks,

David Mathog
mathog_at_[hidden]
Manager, Sequence Analysis Facility, Biology Division, Caltech