Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Reuti (reuti_at_[hidden])
Date: 2012-03-15 11:41:20


Am 15.03.2012 um 15:50 schrieb Ralph Castain:

>
> On Mar 15, 2012, at 8:46 AM, Reuti wrote:
>
>> Am 15.03.2012 um 15:37 schrieb Ralph Castain:
>>
>>> Just to be clear: I take it that the first entry is the host name, and the second is the number of slots allocated on that host?
>>
>> This is correct.
>>
>>
>>> FWIW: I see the problem. Our parser was apparently written assuming every line was a unique host, so it doesn't even check to see if there is duplication. Easy fix - can shoot it to you today.
>>
>> But even with the fix the nice value will be the same for all processes forked there. Either all have the nice value of his low priority queue or the high priority queue.
>
> Agreed - nothing I can do about that, though. We only do the one qrsh call, so the daemons are going to fall into a single queue, and so will all their children. In this scenario, it isn't clear to me (from this discussion) that I can control which queue gets used

Correct.

> - can I?

No. As posted I created an issue for it. But if it would work, then you would get already different $TMPDIRs for each queue.

> Should I?

I can't speak for the community. Personally I would say: don't distribute parallel jobs among different queues at all, as some applications will use some internal communication about the environment variables of the master process to distribute them to the slaves (even if SGE's `qrsh -inherit ...` is called without -V, and even if Open MPI is not told to forward and specific environment variable). If you have a custom application it can work of course, but with closed source ones you can only test and get the experience whether it's working or not.

Not to mention the timing issue of differently niced processes. Adjusting the SGE setup of the OP would be the smarter way IMO.

If it's fixed in Open MPI to add up all the granted slots on one machine, some users may think it's an Open MPI error to attach all to one queue only, as they expect different queues to be used. So this "workaround" should be noted somewhere: >>As it's not possible the reach a specific queue on a slave machine by SGE's tight integration commands (`qrsh -inherit ...`), as a workaround the number of slots across different queues are added up inside the $PE_HOSTFILE of SGE and started in the queue SGE choses for the first issued `qrsh -inherit ...`. Which one is taken can't be predicted though.<<

-- Reuti

>>> On Mar 15, 2012, at 6:53 AM, Reuti wrote:
>>>
>>>> Am 15.03.2012 um 05:22 schrieb Joshua Baker-LePain:
>>>>
>>>>> On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote
>>>>>
>>>>>> On Mar 14, 2012, at 5:44 PM, Reuti wrote:
>>>>>
>>>>>>> (I was just typing when Ralph's message came in: I can confirm this. To avoid it, it would mean for Open MPI to collect all lines from the hostfile which are on the same machine. SGE creates entries for each queue/host pair in the machine file).
>>>>>>
>>>>>> Hmmm…I can take a look at the allocator module and see why we aren't doing it. Would the host names be the same for the two queues?
>>>>>
>>>>> I can't speak authoritatively like Reuti can, but here's what a hostfile
>>>>> looks like on my cluster (note that all our name resolution is done via /etc/hosts -- there's no DNS involved):
>>>>>
>>>>> iq103 8 lab.q_at_iq103 <NULL>
>>>>> iq103 1 test.q_at_iq103 <NULL>
>>>>> iq104 8 lab.q_at_iq104 <NULL>
>>>>> iq104 1 test.q_at_iq104 <NULL>
>>>>> opt221 2 lab.q_at_opt221 <NULL>
>>>>> opt221 1 test.q_at_opt221 <NULL>
>>>>
>>>> Yes, exactly this needs to be parsed and adding up all entries therein for one and the same machine.
>>>>
>>>> If you need it instantly, it could be put in a wrapper for start_proc_args of the PE (and Open MPI compiled without SGE support), so that a custom build machinefile can be used. In this case the rsh resp. ssh call also needs to be caught.
>>>>
>>>> Often the opposite is desired in an SGE setup: tune it so that all slots are coming from one queue only.
>>>>
>>>> But I still wonder whether it is possible to tune your setup in a similar way: allow one slot more in the high priority queue (long,.q) in case it's a parallel job, with an RQS (assuming 8 cores with one core oversubscription):
>>>>
>>>> limit queues long.q pes * to slots=9
>>>> limit queues long.q to slots=8
>>>>
>>>> while you have an additonal short.q (the low priority queue) there with one slot. The overall limit is still set on an exechost level to 9. The PE is then only attached to long.q.
>>>>
>>>> -- Reuti
>>>>
>>>> PS: In your example you also had the case 2 slots in the low priority queue, what is the actual setup in your cluster?
>>>>
>>>>
>>>>>>> @Ralph: it could work if SGE would have a facility to request the desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be unique for each orted again (assuming its using different ports for each).
>>>>>>
>>>>>> Gotcha! I suspect getting the allocator to handle this cleanly is the better solution, though.
>>>>>
>>>>> If I can help (testing patches, e.g.), let me know.
>>>>>
>>>>> --
>>>>> Joshua Baker-LePain
>>>>> QB3 Shared Cluster Sysadmin
>>>>> UCSF_______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>