This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
I believe that r22335 did solve resolve the issue. The problem was
between my screen and my chair. Last night, I reset my paths, but the
directory was appended to the paths which had the old mpi directory
information. I think it was linking with the old libraries. I'll try
it in a production run, but it passed the simpler tests that the old
library failed. I'll post another note if it fails anywhere, but I am
confident that the problem is resolved as you first thought.
On 01/05/2010 10:55 AM, Eugene Loh wrote:
> Hmm, perhaps not so excellent. It seems to me that
> openmpi-1.4a1r22335 does have the fixes to trac 2043. So, either the
> fixes are insufficient and/or you're experiencing a different
> problem. I'll see if I can reproduce your problem, but I'm not
> confident here.
> Louis Rossi wrote:
>> Hi Eugene,
>> Excellent! I could not find r22324. I found r22335 on the openmpi
>> download site (under nightly snapshots), but this did not solve the
>> problem. Any thoughts on where I can find it?
>> On 01/04/2010 09:53 AM, Eugene Loh wrote:
>>> On 01/04/2010 01:23 AM, Eugene Loh wrote:
>>>> 1) What about "-mca coll_sync_barrier_before 100"? (The default
>>>> may be 1000. So, you can try various values less than 1000. I'm
>>>> suggesting 100.) Note that broadcast has somewhat one-way traffic
>>>> flow, which can have some undesirable flow control issues.
>>>> 2) What about "-mca btl_sm_num_fifos 16"? Default is 1. If the
>>>> problem is trac ticket 2043, then this suggestion can help.
>>> Louis Rossi wrote:
>>>> Hi Eugene,
>>>> Thank you for replying so quickly. You are right that there is a
>>>> memory leak. It's not the source of the problem, but I added a
>>>> free(pMessage) to remove the issue. (In my defense, I borrowed a
>>>> simple broadcast example off the web and wrapped it in a loop.)
>>>> Anyway, the great news is that suggestion #2 solved the problem
>>>> for the example. (At least it has not failed yet. I'm exercising
>>>> the solution on the original larger problem now.) Suggestion #1
>>>> did not. Should I post the resolution to the mailing list or is
>>>> this a well known solution? I see this parameter listed under
>>>> performance tuning on the ompi site, but only in reference to
>>>> congestion. There is no comment that bcasts could hang.
>>> Louis Rossi wrote:
>>>> Hi Eugene,
>>>> OK. You nailed it with suggestion #2.
>>>> Many thanks,
>>> Great. Next time, go ahead and respond to the wider mail alias so
>>> that everyone learns that your particular problem was resolved.
>>> I will update the trac ticket to point to this as another instance
>>> of this problem.
>>> One signature of the problem is that GCC 4.4.0 or later exposes the
>>> problem, while earlier revs do not. I can't tell for sure, but it
>>> appears to me that this condition is met with Fedora 11.
>>> Our understanding of trac 2043 has recently improved immensely. It
>>> would be great if you could confirm the fix. The ticket is at
>>> https://svn.open-mpi.org/trac/ompi/ticket/2043 . r22324 should fix
>>> the problem. If you could get that version, build with GCC
>>> (presumably 4.4.0 or more recent), then the workaround should no
>>> longer be needed.
> users mailing list
"Through nonaction, no action is left undone." --Lao Tzu
Louis F. Rossi rossi_at_[hidden]
Department of Mathematical Sciences http://www.math.udel.edu/~rossi
University of Delaware (302) 831-1880 (voice)
Newark, DE 19716 (302) 831-4511 (fax)