You are welcome to stick barriers in - doesn't hurt anything other
On Nov 11, 2009, at 3:00 AM, Glembek OndÅej wrote:
> Thanx for your reply...
> My coll_sync_priority is set to 50. See the dump of ompi_info --
> param coll sync below...
> Does sticking barriers hurt anything or is it just a cosmetic
> thing??? I'm fine with this solution...
> $ompi_info --param coll sync
> MCA coll: parameter "coll" (current value: <none>,
> data source: default value)
> Default selection set of components for the
> coll framework (<none> means use all components that can be found)
> MCA coll: parameter "coll_base_verbose" (current
> value: "0", data source: default value)
> Verbosity level for the coll framework (0 =
> no verbosity)
> MCA coll: parameter "coll_sync_priority" (current
> value: "50", data source: default value)
> Priority of the sync coll component; only
> relevant if barrier_before or barrier_after is > 0
> MCA coll: parameter
> "coll_sync_barrier_before" (current value: "1000", data source:
> default value)
> Do a synchronization before each Nth
> MCA coll: parameter
> "coll_sync_barrier_after" (current value: "0", data source: default
> Do a synchronization after each Nth
> Quoting "Ralph Castain" <rhc_at_[hidden]>:
>> Yeah, that is "normal". It has to do with unexpected messages.
>> When you have procs running at significantly different speeds, the
>> various operations get far enough out of sync that the memory
>> consumed by recvd messages not yet processed grows too large.
>> Instead of sticking barriers into your code, you can have OMPI do
>> an internal sync after every so many operations to avoid the
>> problem. This is done by enabling the "sync" collective component,
>> and then adjusting the number of operations between forced syncs.
>> Do an "ompi_info --params coll sync" to see the options. Then set
>> the coll_sync_priority to something like 100 and it should work for
>> On Nov 10, 2009, at 2:45 PM, Glembek OndÅej wrote:
>>> I am using MPI_Reduce operation on 122880x400 matrix of doubles.
>>> The parallel job runs on 32 machines, each having different
>>> processor in terms of speed, but the architecture and OS is the
>>> same on all machines (x86_64). The task is a typical map-and-
>>> reduce, i.e. each of the processes collects some data, which is
>>> then summed (MPI_Reduce w. MPI_SUM).
>>> Having different processors, each of the jobs comes to the
>>> MPI_Reduce in different time.
>>> The *first problem* came when I called MPI_Reduce on the whole
>>> matrix. The system ended up with *MPI_ERR_OTHER error*, each time
>>> on different rank. I fixed this problem by chunking up the matrix
>>> into 2048 submatrices, calling MPI_Reduce in cycle.
>>> However *second problem* arose --- MPI_Reduce hangs up... It
>>> apparently gets stuck in some kind of dead-lock or something like
>>> that. It seems that if the processors are of similar speed, the
>>> problem disappears, however I cannot provide this condition all
>>> the time.
>>> I managed to get rid of the problem (at least after few non-
>>> problematic iterations) by sticking MPI_Barrier before the
>>> MPI_Reduce line.
>>> The questions are:
>>> 1) is this a usual behavior???
>>> 2) is there some kind of timeout for MPI_Reduce???
>>> 3) why does MPI_Reduce die on large amount of data if the system
>>> has enough address space (64 bit compilation)
>>> Ondrej Glembek
>>> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
>>> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/
>>> Bozetechova 2, 612 66 Phone: +420 54114-1292
>>> Brno, Czech Republic Fax: +420 54114-1290
>>> ICQ: 93233896
>>> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>> users mailing list
>> users mailing list
> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/~glembek
> Bozetechova 2, 612 66 Phone: +420 54114-1292
> Brno, Czech Republic Fax: +420 54114-1290
> ICQ: 93233896
> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
> users mailing list