On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote
> On Mar 14, 2012, at 5:44 PM, Reuti wrote:
>> (I was just typing when Ralph's message came in: I can confirm this. To
>> avoid it, it would mean for Open MPI to collect all lines from the
>> hostfile which are on the same machine. SGE creates entries for each
>> queue/host pair in the machine file).
>
> Hmmm
I can take a look at the allocator module and see why we aren't
> doing it. Would the host names be the same for the two queues?
I can't speak authoritatively like Reuti can, but here's what a hostfile
looks like on my cluster (note that all our name resolution is done via
/etc/hosts -- there's no DNS involved):
iq103 8 lab.q_at_iq103 <NULL>
iq103 1 test.q_at_iq103 <NULL>
iq104 8 lab.q_at_iq104 <NULL>
iq104 1 test.q_at_iq104 <NULL>
opt221 2 lab.q_at_opt221 <NULL>
opt221 1 test.q_at_opt221 <NULL>
>> @Ralph: it could work if SGE would have a facility to request the
>> desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be
>> unique for each orted again (assuming its using different ports for
>> each).
>
> Gotcha! I suspect getting the allocator to handle this cleanly is the
> better solution, though.
If I can help (testing patches, e.g.), let me know.
--
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF
|