Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] r27078 and OMPI build
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2012-08-23 11:58:05


Eugene,

Did you have chance to make progress on the issue #2 ? I'm wondering how we want to proceed from here.

Pavel (Pasha) Shamis

---
Computer Science Research Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Aug 21, 2012, at 2:19 PM, Eugene Loh wrote:
On 8/21/2012 9:31 AM, Ralph Castain wrote:
Looks to me like you just need to add a couple of includes and correct a typo - yes?
Right.  This part is under control.
The library issue sounds like something isn't right in the Makefile.am - perhaps the syntax has a typo there as well?
I don't know.  This is the part where I could use help.  I took a quick
peek at some Makefile.am files.  I can't see what the essential
difference is between, say, coll/ml/Makefile.am and, say,
coll/sm/Makefile.am (which behaves all right).  Nor do I see why there
would be a difference in coll/ml between one system (happens to be
SPARC, though I don't know that's significant) and another.
On Aug 21, 2012, at 11:36 AM, Eugene Loh wrote:
r27078 (ML collective component) broke some Solaris OMPI builds.
1)  In ompi/mca/coll/ml/coll_ml_lmngr.c
   199 #ifdef HAVE_POSIX_MEMALIGN
   200     if((errno = posix_memalign(&lmngr->base_addr,
   201                     lmngr->list_alignment,
   202                     lmngr->list_size * lmngr->list_block_size))
!= 0) {
   203         ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
   204         return OMPI_ERROR;
   205     }
   206 #else
   207     lmngr->base_addr =
   208         malloc(lmngr->list_size * lmngr->list_block_size +
lmngr->list_alignment);
   209     if(NULL == lmngr->base_addr) {
   210         ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
strerror(errno)));
   211         return OMPI_ERROR;
   212     }
   213
   214     lmngr->base_addr =
(void*)OPAL_ALIGN((uintptr_t)lmngr->base_addr,
   215             lmngr->list_align, uintptr_t);
   216 #endif
  The "#else" code path has multiple problems -- specifically at the
statement on lines 214-215:
  - OPAL_ALIGN needs to be defined (e.g., #include "opal/align.h")
  - uintptr_t need to be defined (e.g., #include "opal_stdint.h")
  - list_align should be list_alignment
I could fix, but need help with...
2)  http://www.open-mpi.org/mtt/index.php?do_redir=2089  Somehow,
coll_ml is getting pulled into libmpi.so.  E.g., this doesn't look right:
  % nm ompi/.libs/libmpi.so | grep mca_coll_ml
  [13161] |   2556704|       172|FUNC |LOCL |0    |11
|mca_coll_ml_alloc_op_prog_single_frag_dag
  [13171] |   2555488|       344|FUNC |LOCL |0    |11
|mca_coll_ml_buffer_recycling
  [13173] |   2555392|        92|FUNC |LOCL |0    |11     |mca_coll_ml_err
  [23992] |         0|         0|FUNC |GLOB |0    |UNDEF
|mca_coll_ml_memsync_intra
The UNDEF is causing a problem, but I'm guessing all that mca_coll_ml_
stuff shouldn't be in there at all in the first place.  This is on one
Solaris system, while another doesn't see the problem and builds fine.
_______________________________________________
devel mailing list
devel_at_[hidden]<mailto:devel_at_[hidden]>
http://www.open-mpi.org/mailman/listinfo.cgi/devel