Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Joshua Baker-LePain (jlb17_at_[hidden])
Date: 2012-03-15 13:34:02

On Thu, 15 Mar 2012 at 4:41pm, Reuti wrote

> Am 15.03.2012 um 15:50 schrieb Ralph Castain:
>> On Mar 15, 2012, at 8:46 AM, Reuti wrote:
>>> Am 15.03.2012 um 15:37 schrieb Ralph Castain:
>>>> FWIW: I see the problem. Our parser was apparently written assuming
>>>> every line was a unique host, so it doesn't even check to see if
>>>> there is duplication. Easy fix - can shoot it to you today.
>>> But even with the fix the nice value will be the same for all
>>> processes forked there. Either all have the nice value of his low
>>> priority queue or the high priority queue.
>> Agreed - nothing I can do about that, though. We only do the one qrsh
>> call, so the daemons are going to fall into a single queue, and so will
>> all their children. In this scenario, it isn't clear to me (from this
>> discussion) that I can control which queue gets used
> Correct.

Which I understand. Our queue setup is admittedly a bit wonky (which is
probably why I'm the first one to have this issue). I'm much more
concerned with things not crashing than with them absolutely having the
"right" nice levels. :)

>> Should I?
> I can't speak for the community. Personally I would say: don't
> distribute parallel jobs among different queues at all, as some
> applications will use some internal communication about the environment
> variables of the master process to distribute them to the slaves (even
> if SGE's `qrsh -inherit ...` is called without -V, and even if Open MPI
> is not told to forward and specific environment variable). If you have a
> custom application it can work of course, but with closed source ones
> you can only test and get the experience whether it's working or not.
> Not to mention the timing issue of differently niced processes.
> Adjusting the SGE setup of the OP would be the smarter way IMO.

And I agree with that as well. I understand if the decision is made to
leave the parser the way it is, given that my setup is outside the norm.

Joshua Baker-LePain
QB3 Shared Cluster Sysadmin