Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] System hang-up on MPI_Reduce
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-11-11 08:25:44


You are welcome to stick barriers in - doesn't hurt anything other
than performance.

On Nov 11, 2009, at 3:00 AM, Glembek Ondřej wrote:

> Thanx for your reply...
>
> My coll_sync_priority is set to 50. See the dump of ompi_info --
> param coll sync below...
>
> Does sticking barriers hurt anything or is it just a cosmetic
> thing??? I'm fine with this solution...
>
> Thanx
> Ondrej
>
>
> $ompi_info --param coll sync
> MCA coll: parameter "coll" (current value: <none>,
> data source: default value)
> Default selection set of components for the
> coll framework (<none> means use all components that can be found)
> MCA coll: parameter "coll_base_verbose" (current
> value: "0", data source: default value)
> Verbosity level for the coll framework (0 =
> no verbosity)
> MCA coll: parameter "coll_sync_priority" (current
> value: "50", data source: default value)
> Priority of the sync coll component; only
> relevant if barrier_before or barrier_after is > 0
> MCA coll: parameter
> "coll_sync_barrier_before" (current value: "1000", data source:
> default value)
> Do a synchronization before each Nth
> collective
> MCA coll: parameter
> "coll_sync_barrier_after" (current value: "0", data source: default
> value)
> Do a synchronization after each Nth
> collective
>
>
> Quoting "Ralph Castain" <rhc_at_[hidden]>:
>
>> Yeah, that is "normal". It has to do with unexpected messages.
>>
>> When you have procs running at significantly different speeds, the
>> various operations get far enough out of sync that the memory
>> consumed by recvd messages not yet processed grows too large.
>>
>> Instead of sticking barriers into your code, you can have OMPI do
>> an internal sync after every so many operations to avoid the
>> problem. This is done by enabling the "sync" collective component,
>> and then adjusting the number of operations between forced syncs.
>>
>> Do an "ompi_info --params coll sync" to see the options. Then set
>> the coll_sync_priority to something like 100 and it should work for
>> you.
>>
>> Ralph
>>
>> On Nov 10, 2009, at 2:45 PM, Glembek Ondřej wrote:
>>
>>> Hi,
>>>
>>> I am using MPI_Reduce operation on 122880x400 matrix of doubles.
>>> The parallel job runs on 32 machines, each having different
>>> processor in terms of speed, but the architecture and OS is the
>>> same on all machines (x86_64). The task is a typical map-and-
>>> reduce, i.e. each of the processes collects some data, which is
>>> then summed (MPI_Reduce w. MPI_SUM).
>>>
>>> Having different processors, each of the jobs comes to the
>>> MPI_Reduce in different time.
>>>
>>> The *first problem* came when I called MPI_Reduce on the whole
>>> matrix. The system ended up with *MPI_ERR_OTHER error*, each time
>>> on different rank. I fixed this problem by chunking up the matrix
>>> into 2048 submatrices, calling MPI_Reduce in cycle.
>>>
>>> However *second problem* arose --- MPI_Reduce hangs up... It
>>> apparently gets stuck in some kind of dead-lock or something like
>>> that. It seems that if the processors are of similar speed, the
>>> problem disappears, however I cannot provide this condition all
>>> the time.
>>>
>>> I managed to get rid of the problem (at least after few non-
>>> problematic iterations) by sticking MPI_Barrier before the
>>> MPI_Reduce line.
>>>
>>> The questions are:
>>>
>>> 1) is this a usual behavior???
>>> 2) is there some kind of timeout for MPI_Reduce???
>>> 3) why does MPI_Reduce die on large amount of data if the system
>>> has enough address space (64 bit compilation)
>>>
>>> Thanx
>>> Ondrej Glembek
>>>
>>>
>>> --
>>> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
>>> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/
>>> ~glembek
>>> Bozetechova 2, 612 66 Phone: +420 54114-1292
>>> Brno, Czech Republic Fax: +420 54114-1290
>>>
>>> ICQ: 93233896
>>> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> Ondrej Glembek, PhD student E-mail: glembek_at_[hidden]
> UPGM FIT VUT Brno, L226 Web: http://www.fit.vutbr.cz/~glembek
> Bozetechova 2, 612 66 Phone: +420 54114-1292
> Brno, Czech Republic Fax: +420 54114-1290
>
> ICQ: 93233896
> GPG: C050 A6DC 7291 6776 9B69 BB11 C033 D756 6F33 DE3C
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users