with the Tight Integration of Open MPI into SGE (http://
gridengine.sunsource.net/) you will get a correct accouting. Every
process created with qrsh (a replacement for ssh) will have an
additional group id attached and SGE will accumulate them all.
Depending on the size of the cluster, you might want to look into a
batch queuing system. In fact: we use it even local on some machines
to serialize the workflow.
Am 12.11.2008 um 14:40 schrieb Fabian Hänsel:
>> So, to make sure I understand what happens... This command:
>> mpirun -np 2 myprog
>> starts the program "mpirun" and two processes of "myprog". So, what
>> the "real time" of /usr/bin/time reports is the wall clock for
>> Does the user time have any meaning here?
> At least no meaning you can be sure of what it measures (could be time
> of MPI infrastructure setup, could be time of setup + masterthread,
> could be something completely different - depends on MPI
>> I'm not very good with the
>> theory behind multi-processor programming...but Perl (for example)has
>> a "times" function (http://perldoc.perl.org/functions/times.html)
>> which "Returns a ... list ... for this process and the children of
>> this process". Are the two instances of myprog considered children
>> of mpirun?
> In single system setup: generally yes.
> In multisystem setup: no. The MPI processes span many computers over
> e.g. ssh.
>> Hmmmm, I guess user time does not matter since it is real time that
>> we are interested in reducing.
> Right. Even if we *could* measure user time of every MPI worker
> correctly this was not what you are interested in: Depending on the
> algorithm a significant amount of time could get spend waiting for
> messages to arrive - and that time would not count as user time, but
> also was not 'wasted' as something important happens.
> Best regards
> users mailing list