Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.4rc1 run failure on Solaris 10 / SPARC (not SIGBUS)
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-12-20 14:45:37


Hi Paul

The binding stuff was in there, but the limit protection code just went in today. Jeff has since regenerated the tarball for the web site, so the one up there should have most (if not all) of these problems fixed

Have a great holiday!
Ralph

On Dec 20, 2013, at 11:40 AM, Paul Hargrove <phhargrove_at_[hidden]> wrote:

> Ralph,
>
> I see the same behavior w/ last night's 1.7 tarball (openmpi-1.7.4rc2r30002).
> The very next commit, r30003, is your addition (on trunk) of guards for RLIMIT_AS, etc..
> So, I DON'T think any fix for this behavior is in the 1.7 branch as you thought (maybe just CMR'ed?)
>
> Let me know if there is additional information about the platform or error which I should collect.
>
> -Paul
>
> P.S.
> You may see my email vacation auto-responder message.
> My vacation has started (no *paid* work) but I am still reading email today.
> I plan to re-test tonight's 1.7 tarball on all the systems where I reported issues on Thu night.
>
>
> On Thu, Dec 19, 2013 at 7:19 PM, Ralph Castain <rhc_at_[hidden]> wrote:
> I believe this one has already been fixed and is in the nightly (1.7.4rc2) - for now, you can just set "--bind-to none" on the cmd line to get past it
>
>
> On Dec 19, 2013, at 6:42 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
>> Testing with Solaris 10 on SPARC, I was expecting to encounter the bus error reported previously by Siegman Gross. Instead I see the following hwloc-related abort:
>>
>> $ env PATH=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin:$PATH LD_LIBRARY_PATH_64=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/lib:$LD_LIBRARY_PATH_64 OMPI_MCA_shmem_mmap_enable_nfs_warning=0 /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c
>> --------------------------------------------------------------------------
>> Open MPI tried to bind a new process, but something went wrong. The
>> process was killed without launching the target application. Your job
>> will now abort.
>>
>> Local host: niagara1
>> Application name: examples/ring_c
>> Error message: hwloc indicates cpu binding cannot be enforced
>> Location: /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/openmpi-1.7.4rc1/orte/mca/odls/default/odls_default_module.c:478
>> --------------------------------------------------------------------------
>> 2 total processes failed to start
>>
>>
>> I am assuming I just need some magic pixie dust to disable cpu binding.
>> I'd appreciate some corresponding instructions.
>>
>> However, if this is NOT an expected/desired/known behavior please let me know what I can/should do to help determine the root cause.
>>
>>
>> -Paul
>>
>> --
>> Paul H. Hargrove PHHargrove_at_[hidden]
>> Future Technologies Group
>> Computer and Data Sciences Department Tel: +1-510-495-2352
>> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel