This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Am 07.07.2008 um 11:31 schrieb Romaric David:
> Pak Lui a écrit :
>> It was fixed at one point in the trunk before v1.3 went official,
>> but while rolling the code from gridengine PLM into the rsh PLM
>> code, this feature was left out because there was some lingering
>> issues that I didn't resolved and I lost track of it. Sorry but
>> thanks for bringing it up, I will need to look at the issue again
>> and reopen this ticket against v1.3:
> Ok, so I have to wait for a 1.3 version to work with job suspend, or
> will it be back-ported to 1.2.6 or 1.2.6 ?
>> So even it is the rsh PLM that starts the parallel job under SGE,
>> the rsh PLM can detect if the Open MPI job is started under the
>> SGE Parallel Environment (via checking some SGE env vars) and use
>> the "qrsh --inherit" command to launch the parallel job the same
>> way as it was before. You can check by setting MCA to something
>> like "--mca plm_base_verbose 10" in your mpirun command and look
>> for the launch commands that mpirun uses.
> It looks like shepherd cannot be started for a reason I couldn't
> get yet.
> /opt/SGE/utilbin/lx24-amd64/rsh exited with exit code 0
> reading exit code from shepherd ... 255
> [hostname:16745] ----------------------------
you mean with the plain rsh startup, like a loose integration? Isn't
in this case a proper hostlist necessary, which is for other MPI
implementations built in the start_proc_args defined routine? AFAIK
you can disregard the hostlist only with Open MPI's tight SGE support.