Hi Paul

The binding stuff was in there, but the limit protection code just went in today. Jeff has since regenerated the tarball for the web site, so the one up there should have most (if not all) of these problems fixed

Have a great holiday!
Ralph


On Dec 20, 2013, at 11:40 AM, Paul Hargrove <phhargrove@lbl.gov> wrote:

Ralph,

I see the same behavior w/ last night's 1.7 tarball (openmpi-1.7.4rc2r30002).
The very next commit, r30003, is your addition (on trunk) of guards for RLIMIT_AS, etc..
So, I DON'T think any fix for this behavior is in the 1.7 branch as you thought (maybe just CMR'ed?)

Let me know if there is additional information about the platform or error which I should collect.

-Paul

P.S.
You may see my email vacation auto-responder message.
My vacation has started (no *paid* work) but I am still reading email today.
I plan to re-test tonight's 1.7 tarball on all the systems where I reported issues on Thu night.


On Thu, Dec 19, 2013 at 7:19 PM, Ralph Castain <rhc@open-mpi.org> wrote:
I believe this one has already been fixed and is in the nightly (1.7.4rc2) - for now, you can just set "--bind-to none" on the cmd line to get past it


On Dec 19, 2013, at 6:42 PM, Paul Hargrove <phhargrove@lbl.gov> wrote:

Testing with Solaris 10 on SPARC, I was expecting to encounter the bus error reported previously by Siegman Gross.  Instead I see the following hwloc-related abort:

$ env   PATH=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin:$PATH  LD_LIBRARY_PATH_64=/home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/lib:$LD_LIBRARY_PATH_64  OMPI_MCA_shmem_mmap_enable_nfs_warning=0  /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/INST/bin/mpirun -mca btl sm,self -np 2 examples/ring_c
--------------------------------------------------------------------------
Open MPI tried to bind a new process, but something went wrong.  The
process was killed without launching the target application.  Your job
will now abort.

  Local host:        niagara1
  Application name:  examples/ring_c
  Error message:     hwloc indicates cpu binding cannot be enforced
  Location:          /home/hargrove/OMPI/openmpi-1.7.4rc1-solaris10-sparcT2-ss12u3-v9/openmpi-1.7.4rc1/orte/mca/odls/default/odls_default_module.c:478
--------------------------------------------------------------------------
2 total processes failed to start


I am assuming I just need some magic pixie dust to disable cpu binding.
I'd appreciate some corresponding instructions.

However, if this is NOT an expected/desired/known behavior please let me know what I can/should do to help determine the root cause.


-Paul 

--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



--
Paul H. Hargrove                          PHHargrove@lbl.gov
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel