Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Tim Prins (tprins_at_[hidden])
Date: 2007-06-06 11:21:53


Actually, the tests are quite painful to run, since there are things in
there that aren't real tests (such as spin, no-op, loob-child, etc) and
I really don't know what the expected output should be.

Anyways, I have made my way through these things, and I could not see
any failures. This should clear the way for these changesets to be being
brought in.

George: Do you want to bring this over? If you do, remember to also
remove test/class/orte_bitmap.c

Thanks,

Tim

Ralph H Castain wrote:
> Sigh...is it really so much to ask that we at least run the tests in
> orte/test/system and orte/test/mpi using both mpirun and singleton (where
> appropriate) instead of just relying on "well I ran hello_world"?
>
> That is all I have ever asked, yet it seems to be viewed as a huge
> impediment. Is it really that much to ask for when modifying a core part of
> the system? :-/
>
> If you have done those tests, then my apology - but your note only indicates
> that you ran "hello_world" and are basing your recommendation *solely* on
> that test.
>
>
> On 6/6/07 7:51 AM, "Tim Prins" <tprins_at_[hidden]> wrote:
>
>
>> I hate to go back to this, but...
>>
>> The original commits also included changes to gpr_replica_dict_fn.c
>> (r14331 and r14336). This change shows some performance improvement for
>> me (about %8 on mpi hello, 123 nodes, 4ppn), and cleans up some ugliness
>> in the gpr. Again, this is a algorithmic change so as the job scales the
>> performance improvement would be more noticeable.
>>
>> I vote that this be put back in.
>>
>> On a related topic, a small memory leak was fixed in r14328, and then
>> reverted. This change should be put back in.
>>
>> Tim
>>
>> George Bosilca wrote:
>>
>>> Commit r14791 apply this patch to the trunk. Let me know if you
>>> encounter any kind of troubles.
>>>
>>> Thanks,
>>> george.
>>>
>>> On May 29, 2007, at 2:28 PM, Ralph Castain wrote:
>>>
>>>
>>>> After some work off-list with Tim, it appears that something has been
>>>> broken
>>>> again on the OMPI trunk with respect to comm_spawn. It was working
>>>> two weeks
>>>> ago, but...sigh.
>>>>
>>>> Anyway, it doesn't appear to have any bearing either way on George's
>>>> patch(es), so whomever wants to commit them is welcome to do so.
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>>
>>>> On 5/29/07 11:44 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>>>
>>>>
>>>>>
>>>>> On 5/29/07 11:02 AM, "Tim Prins" <tprins_at_[hidden]> wrote:
>>>>>
>>>>>
>>>>>> Well, after fixing many of the tests...
>>>>>>
>>>>> Interesting - they worked fine for me. Perhaps a difference in
>>>>> environment.
>>>>>
>>>>>
>>>>>> It passes all the tests
>>>>>> except the spawn tests. However, the spawn tests are seriously broken
>>>>>> without this patch as well, and the ibm mpi spawn tests seem to work
>>>>>> fine.
>>>>>>
>>>>> Then something is seriously wrong. The spawn tests were working as
>>>>> of my
>>>>> last commit - that is a test I religiously run. If the spawn test here
>>>>> doesn't work, then it is hard to understand how the mpi spawn can
>>>>> work since
>>>>> the call is identical.
>>>>>
>>>>> Let me see what's wrong first...
>>>>>
>>>>>
>>>>>> As far as I'm concerned, this should assuage any fear of problems
>>>>>> with these changes and they should now go in.
>>>>>>
>>>>>> Tim
>>>>>>
>>>>>> On May 29, 2007, at 11:34 AM, Ralph Castain wrote:
>>>>>>
>>>>>>
>>>>>>> Well, I'll be the voice of caution again...
>>>>>>>
>>>>>>> Tim: did you run all of the orte tests in the orte/test/system
>>>>>>> directory? If
>>>>>>> so, and they all run correctly, then I have no issue with doing the
>>>>>>> commit.
>>>>>>> If not, then I would ask that we not do the commit until that has
>>>>>>> been done.
>>>>>>>
>>>>>>> In running those tests, you need to run them on a multi-node
>>>>>>> system, both
>>>>>>> using mpirun and as singletons (you'll have to look at the tests to
>>>>>>> see
>>>>>>> which ones make sense in the latter case). This will ensure that we
>>>>>>> have at
>>>>>>> least some degree of coverage.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ralph
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/29/07 9:23 AM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>>>>>>>
>>>>>>>
>>>>>>>> I'd be happy to commit the patch into the trunk. But after what
>>>>>>>> happened last time, I'm more than cautious. If the community think
>>>>>>>> the patch is worth having it, let me know and I'll push it in the
>>>>>>>> trunk asap.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> george.
>>>>>>>>
>>>>>>>> On May 29, 2007, at 10:56 AM, Tim Prins wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> I think both patches should be put in immediately. I have done some
>>>>>>>>> simple testing, and with 128 nodes of odin, with 1024 processes
>>>>>>>>> running mpi hello, these decrease our running time from about 14.2
>>>>>>>>> seconds to 10.9 seconds. This is a significant decrease, and as the
>>>>>>>>> scale increases there should be increasing benefit.
>>>>>>>>>
>>>>>>>>> I'd be happy to commit these changes if no one objects.
>>>>>>>>>
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>> On May 24, 2007, at 8:39 AM, Ralph H Castain wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Thanks - I'll take a look at this (and the prior ones!) in the
>>>>>>>>>> next
>>>>>>>>>> couple
>>>>>>>>>> of weeks when time permits and get back to you.
>>>>>>>>>>
>>>>>>>>>> Ralph
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 5/23/07 1:11 PM, "George Bosilca" <bosilca_at_[hidden]> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> Attached is another patch to the ORTE layer, more specifically
>>>>>>>>>>> the
>>>>>>>>>>> replica. The idea is to decrease the number of strcmp by using a
>>>>>>>>>>> small hash function before doing the strcmp. The hask key for
>>>>>>>>>>> each
>>>>>>>>>>> registry entry is computed when it is added to the registry. When
>>>>>>>>>>> we're doing a query, instead of comparing the 2 strings we first
>>>>>>>>>>> check if the hash key match, and if they do match then we compare
>>>>>>>>>>> the
>>>>>>>>>>> 2 strings in order to make sure we eliminate collisions from our
>>>>>>>>>>> answers.
>>>>>>>>>>>
>>>>>>>>>>> There is some benefit in terms of performance. It's hardly
>>>>>>>>>>> visible
>>>>>>>>>>> for few processes, but it start showing up when the number of
>>>>>>>>>>> processes increase. In fact the number of strcmp in the trace
>>>>>>>>>>> file
>>>>>>>>>>> drastically decrease. The main reason it works well, is because
>>>>>>>>>>> most
>>>>>>>>>>> of the keys start with basically the same chars (such as orte-
>>>>>>>>>>> blahblah) which transform the strcmp on a loop over few chars.
>>>>>>>>>>>
>>>>>>>>>>> Ralph, please consider it for inclusion on the ORTE layer.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> george.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> devel mailing list
>>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> devel mailing list
>>>>>>>>>> devel_at_[hidden]
>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> devel mailing list
>>>>>>>> devel_at_[hidden]
>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>