I’ve written an openmpi program that “self schedules” the work.
The master task is in a loop chunking up an input stream and handing off jobs to worker tasks. At first the master gives the next job to the next highest rank. After all ranks have their first job, the master waits via an MPI receive call for the next free worker. The master parses out the rank from the MPI receive and sends the next job to this node. The jobs aren’t all identical, so they run for slightly different durations based on the input data.
When I plot a histogram of the number of jobs each worker performed, the lower mpi ranks are doing much more work than the higher ranks. For example, in a 120 process run, rank 1 did 32 jobs while rank 119 only did 2. My guess is that openmpi returns the lowest rank from the MPI Recv when I’ve got MPI_ANY_SOURCE set and multiple sends have happened since the last call.
Is there a different Recv call to make that will spread out the data better?