Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.4rc1 run failure on Solaris 10 / SPARC (not SIGBUS)
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-12-20 15:00:53


Ralph and Jeff,

Thanks for all the rapid fixes.
I'll send openmpi-1.7.4rc2r30031 for a spin while I go wait in line at the
Post Office.

-Paul

On Fri, Dec 20, 2013 at 11:45 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Hi Paul
>
> The binding stuff was in there, but the limit protection code just went in
> today. Jeff has since regenerated the tarball for the web site, so the one
> up there should have most (if not all) of these problems fixed
>
> Have a great holiday!
> Ralph
>
>
> On Dec 20, 2013, at 11:40 AM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
> Ralph,
>
> I see the same behavior w/ last night's 1.7 tarball
> (openmpi-1.7.4rc2r30002).
> The very next commit, r30003, is your addition (on trunk) of guards for
> RLIMIT_AS, etc..
> So, I DON'T think any fix for this behavior is in the 1.7 branch as you
> thought (maybe just CMR'ed?)
>
> Let me know if there is additional information about the platform or error
> which I should collect.
>
> -Paul
>
> P.S.
> You may see my email vacation auto-responder message.
> My vacation has started (no *paid* work) but I am still reading email
> today.
> I plan to re-test tonight's 1.7 tarball on all the systems where I
> reported issues on Thu night.
>
>
> On Thu, Dec 19, 2013 at 7:19 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> I believe this one has already been fixed and is in the nightly
>> (1.7.4rc2) - for now, you can just set "--bind-to none" on the cmd line to
>> get past it
>>
>>
>> On Dec 19, 2013, at 6:42 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>>
>> Testing with Solaris 10 on SPARC, I was expecting to encounter the bus
>> error reported previously by Siegman Gross. Instead I see the following
>> hwloc-related abort:
>>
>> $ env
>> PATH=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin:$PATH
>> LD_LIBRARY_PATH_64=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/lib:$LD_LIBRARY_PATH_64
>> OMPI_MCA_shmem_mmap_enable_nfs_warning=0
>> /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin/mpirun
>> -mca btl sm,self -np 2 examples/ring_c
>> --------------------------------------------------------------------------
>> Open MPI tried to bind a new process, but something went wrong. The
>> process was killed without launching the target application. Your job
>> will now abort.
>>
>> Local host: niagara1
>> Application name: examples/ring_c
>> Error message: hwloc indicates cpu binding cannot be enforced
>> Location:
>> /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/openmpi-1.7.4rc1/orte/mca/odls/default/odls_default_module.c:478
>> --------------------------------------------------------------------------
>> 2 total processes failed to start
>>
>>
>> I am assuming I just need some magic pixie dust to disable cpu binding.
>> I'd appreciate some corresponding instructions.
>>
>> However, if this is NOT an expected/desired/known behavior please let me
>> know what I can/should do to help determine the root cause.
>>
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900