Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-03-15 13:38:20


No, I'll fix the parser as we should be able to run anyway. Just can't guarantee which queue the job will end up in, but at least it -will- run.

On Mar 15, 2012, at 11:34 AM, Joshua Baker-LePain wrote:

> On Thu, 15 Mar 2012 at 4:41pm, Reuti wrote
>
>> Am 15.03.2012 um 15:50 schrieb Ralph Castain:
>>>
>>> On Mar 15, 2012, at 8:46 AM, Reuti wrote:
>>>
>>>> Am 15.03.2012 um 15:37 schrieb Ralph Castain:
>>>>
>>>>> FWIW: I see the problem. Our parser was apparently written assuming every line was a unique host, so it doesn't even check to see if there is duplication. Easy fix - can shoot it to you today.
>>>>
>>>> But even with the fix the nice value will be the same for all processes forked there. Either all have the nice value of his low priority queue or the high priority queue.
>>>
>>> Agreed - nothing I can do about that, though. We only do the one qrsh call, so the daemons are going to fall into a single queue, and so will all their children. In this scenario, it isn't clear to me (from this discussion) that I can control which queue gets used
>>
>> Correct.
>
> Which I understand. Our queue setup is admittedly a bit wonky (which is
> probably why I'm the first one to have this issue). I'm much more concerned with things not crashing than with them absolutely having the "right" nice levels. :)
>
>>> Should I?
>>
>> I can't speak for the community. Personally I would say: don't distribute parallel jobs among different queues at all, as some applications will use some internal communication about the environment variables of the master process to distribute them to the slaves (even if SGE's `qrsh -inherit ...` is called without -V, and even if Open MPI is not told to forward and specific environment variable). If you have a custom application it can work of course, but with closed source ones you can only test and get the experience whether it's working or not.
>>
>> Not to mention the timing issue of differently niced processes. Adjusting the SGE setup of the OP would be the smarter way IMO.
>
> And I agree with that as well. I understand if the decision is made to leave the parser the way it is, given that my setup is outside the norm.
>
> --
> Joshua Baker-LePain
> QB3 Shared Cluster Sysadmin
> UCSF
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users