Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] ibm/dynamic/loop_spawn
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-08-15 20:47:48


I don't really care, but note that loop_spawn was created by me to test a very specific user-reported problem. It should "self-throttle" - i.e., the entire idea is that comm_spawn "blocks" until the system has room for another process, and then starts it. If that isn't working correctly, then OMPI isn't behaving properly.

If you are having problems with the test, we should determine the origin of the problem - if it truly is a test harness issue, and not something in the code, then no problem with dialing things back.

On Aug 15, 2011, at 2:29 PM, Rolf vandeVaart wrote:

> I think this is a good idea.
>
> I have spent a fair amount of time in the past analyzing timeouts from this set of tests. I had to figure out if it was an actual timeout or if the test was just running very slowly.
> In fact, I see that sometime in the past I throttled back the number of iterations in the loop_spawn.c test for just this reason.
>
> Therefore, I think your idea would be a nice improvement.
>
> Rolf
>
>> -----Original Message-----
>> From: devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]]
>> On Behalf Of Eugene Loh
>> Sent: Monday, August 15, 2011 11:47 AM
>> To: devel_at_[hidden]
>> Subject: [OMPI devel] ibm/dynamic/loop_spawn
>>
>> This is a question about ompi-tests/ibm/dynamic. Some of these tests
>> (spawn, spawn_multiple, loop_spawn/child, and no-disconnect) exercise
>> MPI_Comm_spawn* functionality. Specifically, they spawn additional
>> processes (beyond the initial mpirun launch) and therefore exert a different
>> load on a test system than one might naively expect from the "mpirun -np
>> <np>" command line.
>>
>> One approach to testing is to have the test harness know characteristics about
>> individual tests like this. E.g., if I have only 8 processors and I don't want to
>> oversubscribe, have the test harness know that particular tests should be
>> launched with fewer processes. On the other hand, building such generality
>> into a test harness when changes would have to be so pervasive (subjective
>> assessment) and so few tests require it may not make that much sense.
>>
>> Another approach would be to manage oversubscription in the tests
>> themselves. E.g., for spawn.c, instead of spawning np new processes, do the
>> following:
>>
>> - idle np/2 of the processes
>> - have the remaining np/2 processes spawn np/2 new ones
>>
>> (Okay, so that leaves open the possibility that the newly spawned processes
>> might not appear on the same nodes where idled processes have "made
>> room" for them. Each solution seems loaded with shortcomings.)
>>
>> Anyhow, I was interested in some feedback on this topic. A very small
>> number (1-4) of spawning tests are causing us lots of problems (undue
>> complexity in the test harness as well as a bunch of our time for reasons I find
>> difficult to explain succinctly). We're inclined to modify the tests so that
>> they're a little more social. E.g., make decisions about how many of the
>> launched processes should "really" be used, idling some fraction of the
>> processes, and continuing the test only with the remaining fraction.
>>
>> Comments?
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> -----------------------------------------------------------------------------------
> This email message is for the sole use of the intended recipient(s) and may contain
> confidential information. Any unauthorized review, use, disclosure or distribution
> is prohibited. If you are not the intended recipient, please contact the sender by
> reply email and destroy all copies of the original message.
> -----------------------------------------------------------------------------------
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel