Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] openshmem comments
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2013-03-05 10:18:58


Feedback from call last week:
-----------------------------

- make yoda work with all btls
- pragmas in MPI_Init/Finalize seem sketchy
- possible conflict between opal and oshmem ptmalloc?
- be careful of BTL move to OPAL

Jeff's questions:
-----------------

- All the CMake and .windows stuff can now disappear.

- Why 2 README-SHMEM files? Seems like they should be part of the main README file.

- contrib/tau/readme: should be named README, and readable/non-verbatim text should be wrapped to 76 cols

- has the TAU patch been submitted upstream?

- The distr/ directory looks like Mellanox-specific release stuff. Should that be in the community tree? And if so, why isn't it under contrib/dist/, e.g., contrib/dist/mellanox?

- Has the knem patch been submitted upstream?

- If you need to maintain the knem patch in this tree, it should be under the oshmem directory, not the top level

- I would strongly discourage using "f77" anywhere in openshmem -- there have not been Fortran 77 compilers **in over 25 years**. MPI has *never* been Fortran-77 compliant. Use "fortran", and your wrapper compiler should be "shmemfort", not "shmemf77" (which parallels the move to "mpifort").

- Be aware that Nathan's change to the MCA param system is coming very, very soon. You will likely need to update to match it.

- I see Mellanox/coverity-specific paths in trunk/Makefile.am. Don't do that. IF you want to add a "cov" target, then put a proper configure check looking for coverity in the path, and make the "cov" target do something sane when coverity is not found in your path.

- configure.ac: We already AC_CHECK_SIZEOF(long); don't put in a 2nd one.

- what is the additional memset() for in ompi_free_list.c? I see it also in a few places in the SM BTL, grdma mpool module, etc. Why is this necessary?

*** This is probably my biggest problem: I really, really dislike the insertion of open shmem flags and functionality in the Open MPI layer. There are very, Very, VERY few abstraction violations between OPAL, ORTE, and OMPI -- and we've worked very, very hard to keep it that way. You're adding a TON of abstraction violations (i.e., making the MPI layer aware of the openshmem layer above it).

- btl_openib.c: you have

#if OSHMEM_ENABLED
#else

Temporarily ignoring the fact that this is an abstraction violation, why not use #if !OSHMEM_ENABLED? Having an empty #if block seems just weird.

Later, don't use "#if OSHMEM_ENABLED == 0" -- use "#if !OSHMEM_ENABLED" (that's just an OMPI convention).

- This same block in btl_openib.c, though (and many others like it in the openib BTL) -- why are you removing that code? Is an openshmem-compiled code base not usable as an MPI code base?

- What is this random permutation of qp's that you put in connect_oob.c?

- Do not use // style comments in C code.
  --> Actually this might warrant a larger discussion. We officially force a C99-compliant compiler now, right? So do we want to start allowing //-style comments?

- I see tabs used in btl_sm.c. Spaces. Not tabs. 4 space indents. Always.

- btl_sm.c: when testing with ==, please put the constants on the left (this is defensive programming):

    if (BTL_SM_HDR_TYPE_GET_AS_SEND == frag->hdr->tag)

- btl_sm_component.c: what is the MCA param component_use_knem_value for? The help message string is NULL -- please put in a real message.

- I can't tell what is an improvement for the KNEM support in the SM BTL vs. what is oshem specific. If some stuff is just improvements to KNEM SM BTL support, that should come in separately and not be in #if blocks.

- Warning/Info messages should be output with opal_show_help(), not opal_output().

- The only changes to trunk/ompi/mca/sbgp/ibnet/Makefile.am are commented out things; they should not be added.

- Please explain the pragmas in the MPI API functions (I seem to recall you explained these last week, but I don't remember why any more).

- I don't understand the changes made to ompi_info. Why wouldn't you create oshmem-info? (like orte-info) Or perhaps this is a good opprotunity to re-factor the ompi_info code so that its core can be down in opal, and top-level tools like opal-info, orte-info, ompi_info, and oshmem-info can just call some common subroutines.

--> Nathan may have done this in his MCA param base revamp, BTW... Please check with him.

- Why was trunk/opal/etc/openmpi-mca-params.conf modified? It should not include any parameters -- users can put whatever they want in there.

- orte/etc/Makefile.am -- don't create a file from Makefile.am. Have the file already created, and Makefile just installs/uninstalls it. If you need to substitute paths in there, then use the same .in template system that the rest of the tree uses.

- orte/etc/Makefile.am: is this the Open MPI implementation of Open Shmem, or Mellanox ScaleableSHMEM?

- Is there an oshmemrun, similar to orterun/mpirun?

- How will this code be distributed? Will openmpi tarballs include the openshmem code? Or will there be separate openshmem tarballs that include all of Open MPI + the openshmem project?

- What happens when you autogen/configure/make now; does that include openshmem?

- Related by slightly different: How does one configure openshem vs. openmpi? It seems like they are two different things, now (since you change a bunch of BTL functionality based on whether openshmem is enabled or not).

- are there any openshmem tests that should be added to MTT?

SUMMARY

- This is much more invasive to the OMPI layer than expected:
  1. Some of the changes seem to be improvements to existing code in OPAL and OMPI. Please just bring these in separately, outside of OSHMEM. Such improvements benefit everyone.
  2. Other changes seem to augmentations for OSHMEM support.

- In general, it seems like Open Shmem was grafted *in to* the MPI layer, rather than *on top of* the MPI layer. I have a large problem with this. I'd rather see a *much* cleaner separation between the two. I'm sure that some level of integration is inevitable -- but the sheer number of #if's all over the place is quite worrysome. Can you separate these better?

- If it would help, perhaps the BTL's functionality should be extended to support what openshmem needs, rather than putting in a bunch of #if's in the BTLs that openshmem supports. This might be significantly cleaner, and also provide better long-term support for openshmem.

- I don't understand the branding/configury between openshmem and OMPI. It seems like if you're using openshmem, you should be using openshmem-branded tools (e.g., openshmemrun). Just like we do with ORTE -- there are many ORTE-only applications out there that use 100% ORTE-branded tools, and never do anything with MPI. The same should be true with openshmem. Otherwise, it's very confusing for end users.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/