Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] 1.7.4rc1 run failure on Solaris 10 / SPARC (not SIGBUS)
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-12-20 14:40:08


Ralph,

I see the same behavior w/ last night's 1.7 tarball
(openmpi-1.7.4rc2r30002).
The very next commit, r30003, is your addition (on trunk) of guards for
RLIMIT_AS, etc..
So, I DON'T think any fix for this behavior is in the 1.7 branch as you
thought (maybe just CMR'ed?)

Let me know if there is additional information about the platform or error
which I should collect.

-Paul

P.S.
You may see my email vacation auto-responder message.
My vacation has started (no *paid* work) but I am still reading email today.
I plan to re-test tonight's 1.7 tarball on all the systems where I reported
issues on Thu night.

On Thu, Dec 19, 2013 at 7:19 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I believe this one has already been fixed and is in the nightly (1.7.4rc2)
> - for now, you can just set "--bind-to none" on the cmd line to get past it
>
>
> On Dec 19, 2013, at 6:42 PM, Paul Hargrove <phhargrove_at_[hidden]> wrote:
>
> Testing with Solaris 10 on SPARC, I was expecting to encounter the bus
> error reported previously by Siegman Gross. Instead I see the following
> hwloc-related abort:
>
> $ env
> PATH=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin:$PATH
> LD_LIBRARY_PATH_64=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/lib:$LD_LIBRARY_PATH_64
> OMPI_MCA_shmem_mmap_enable_nfs_warning=0
> /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin/mpirun
> -mca btl sm,self -np 2 examples/ring_c
> --------------------------------------------------------------------------
> Open MPI tried to bind a new process, but something went wrong. The
> process was killed without launching the target application. Your job
> will now abort.
>
> Local host: niagara1
> Application name: examples/ring_c
> Error message: hwloc indicates cpu binding cannot be enforced
> Location:
> /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/openmpi-1.7.4rc1/orte/mca/odls/default/odls_default_module.c:478
> --------------------------------------------------------------------------
> 2 total processes failed to start
>
>
> I am assuming I just need some magic pixie dust to disable cpu binding.
> I'd appreciate some corresponding instructions.
>
> However, if this is NOT an expected/desired/known behavior please let me
> know what I can/should do to help determine the root cause.
>
>
> -Paul
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900