Subject: Re: [OMPI devel] r27078 and OMPI build
From: Shamis, Pavel (shamisp_at_[hidden])
Date: 2012-08-21 14:53:48

On 8/21/2012 9:31 AM, Ralph Castain wrote:
Looks to me like you just need to add a couple of includes and correct a typo - yes?
Right. This part is under control.

I hope r27100<> resolves the issue #1

The library issue sounds like something isn't right in the - perhaps the syntax has a typo there as well?
I don't know. This is the part where I could use help. I took a quick
peek at some files. I can't see what the essential
difference is between, say, coll/ml/ and, say,
coll/sm/ (which behaves all right). Nor do I see why there
would be a difference in coll/ml between one system (happens to be
SPARC, though I don't know that's significant) and another.

I can't reproduce the problem on mac and linux systems....

On Aug 21, 2012, at 11:36 AM, Eugene Loh wrote:

r27078 (ML collective component) broke some Solaris OMPI builds.

1) In ompi/mca/coll/ml/coll_ml_lmngr.c
   200 if((errno = posix_memalign(&lmngr->base_addr,
   201 lmngr->list_alignment,
   202 lmngr->list_size * lmngr->list_block_size))
!= 0) {
   203 ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
   204 return OMPI_ERROR;
   205 }
   206 #else
   207 lmngr->base_addr =
   208 malloc(lmngr->list_size * lmngr->list_block_size +
   209 if(NULL == lmngr->base_addr) {
   210 ML_ERROR(("Failed to allocate memory: %s [%d]", errno,
   211 return OMPI_ERROR;
   212 }
   214 lmngr->base_addr =
   215 lmngr->list_align, uintptr_t);
   216 #endif
  The "#else" code path has multiple problems -- specifically at the
statement on lines 214-215:
  - OPAL_ALIGN needs to be defined (e.g., #include "opal/align.h")
  - uintptr_t need to be defined (e.g., #include "opal_stdint.h")
  - list_align should be list_alignment

I could fix, but need help with...

2) Somehow,
coll_ml is getting pulled into E.g., this doesn't look right:

  % nm ompi/.libs/ | grep mca_coll_ml
  [13161] | 2556704| 172|FUNC |LOCL |0 |11
  [13171] | 2555488| 344|FUNC |LOCL |0 |11
  [13173] | 2555392| 92|FUNC |LOCL |0 |11 |mca_coll_ml_err
  [23992] | 0| 0|FUNC |GLOB |0 |UNDEF

The UNDEF is causing a problem, but I'm guessing all that mca_coll_ml_
stuff shouldn't be in there at all in the first place. This is on one
Solaris system, while another doesn't see the problem and builds fine.
