Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Joshua Baker-LePain (jlb17_at_[hidden])
Date: 2012-03-15 00:22:53


On Wed, 14 Mar 2012 at 5:50pm, Ralph Castain wrote

> On Mar 14, 2012, at 5:44 PM, Reuti wrote:

>> (I was just typing when Ralph's message came in: I can confirm this. To
>> avoid it, it would mean for Open MPI to collect all lines from the
>> hostfile which are on the same machine. SGE creates entries for each
>> queue/host pair in the machine file).
>
> Hmmm…I can take a look at the allocator module and see why we aren't
> doing it. Would the host names be the same for the two queues?

I can't speak authoritatively like Reuti can, but here's what a hostfile
looks like on my cluster (note that all our name resolution is done via
/etc/hosts -- there's no DNS involved):

iq103 8 lab.q_at_iq103 <NULL>
iq103 1 test.q_at_iq103 <NULL>
iq104 8 lab.q_at_iq104 <NULL>
iq104 1 test.q_at_iq104 <NULL>
opt221 2 lab.q_at_opt221 <NULL>
opt221 1 test.q_at_opt221 <NULL>

>> @Ralph: it could work if SGE would have a facility to request the
>> desired queue in `qrsh -inherit ...`, because then the $TMPDIR would be
>> unique for each orted again (assuming its using different ports for
>> each).
>
> Gotcha! I suspect getting the allocator to handle this cleanly is the
> better solution, though.

If I can help (testing patches, e.g.), let me know.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF