Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
From: Barrett, Brian W (bwbarre_at_[hidden])
Date: 2013-08-14 17:03:35


Josh -

In general, I don't have a strong opinion of whether OpenSHMEM is on by
default or not. It might cause unexpected behavior for some users (like
on Crays, where one should really use Cray's SHMEM), but maybe it's better
on other platforms.

I also would have no objection to the RFC, provided the segfaults I found
get resolved.

Brian

On 8/14/13 2:08 PM, "Joshua Ladd" <joshual_at_[hidden]> wrote:

>Ralph, and Brian
>
>Thanks a bunch for taking the time to review this. It is extremely
>helpful. Let me comment of the building of OSHMEM and solicit some
>feedback from you guys (along with the rest of the community.)
>Originally we had planned to enable OSHMEM to build only if
>'--with-oshmem' flag was passed at configure time. However, (unbeknownst
>to me) this behavior was changed and now OSHMEM is built by default, i.e.
>yes, Ralph this is the intended behavior now. I am wondering if this is
>such a good idea. Do folks have a strong opinion on this one way or the
>other? From my perspective I can see arguments for both sides of the
>coin.
>
>Other than cleaning up warnings and resolving the segfault that Brian
>observed are we on a good course to getting this upstream? Is it
>reasonable to file an RFC for three weeks out?
>
>Josh
>
>-----Original Message-----
>From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Barrett,
>Brian W
>Sent: Sunday, August 11, 2013 1:42 PM
>To: Open MPI Developers
>Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>
>Ralph -
>
>I think those warnings are just because of when they last synced with the
>trunk; it looks like they haven't updated in the last week, when those
>(and some usnic fixes) went in.
>
>More concerning is the --enable-picky stuff and the disabling of SHMEM in
>the right places.
>
>Brian
>
>On 8/11/13 11:24 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
>>Turning off the enable_picky, I get it to compile with the following
>>warnings:
>>
>>pget_elements_x_f.c:70: warning: no previous prototype for
>>'ompi_get_elements_x_f'
>>pstatus_set_elements_x_f.c:70: warning: no previous prototype for
>>'ompi_status_set_elements_x_f'
>>ptype_get_extent_x_f.c:69: warning: no previous prototype for
>>'ompi_type_get_extent_x_f'
>>ptype_get_true_extent_x_f.c:69: warning: no previous prototype for
>>'ompi_type_get_true_extent_x_f'
>>ptype_size_x_f.c:69: warning: no previous prototype for
>>'ompi_type_size_x_f'
>>
>>I also found that OpenShmem is still building by default. Is that
>>intended? I thought you were only going to build if --with-shmem (or
>>whatever option) was given.
>>
>>Looks like some cleanup is required
>>
>>On Aug 10, 2013, at 8:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> FWIW, I couldn't get it to build - this is on a simple Xeon-based
>>>system under CentOS 6.2:
>>>
>>> cc1: warnings being treated as errors
>>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1
>>>of 'opal_atomic_add_32' differ in signedness
>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>'volatile int32_t *' but argument is of type 'uint32_t *'
>>> spml_yoda_getreq.c:98: error: signed and unsigned type in conditional
>>>expression
>>> cc1: warnings being treated as errors
>>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1
>>>of 'opal_atomic_add_32' differ in signedness
>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>'volatile int32_t *' but argument is of type 'uint32_t *'
>>> spml_yoda_putreq.c:81: error: signed and unsigned type in conditional
>>>expression
>>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>>> make[2]: *** Waiting for unfinished jobs....
>>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>>> cc1: warnings being treated as errors
>>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>>> spml_yoda.c:725: error: pointer targets in passing argument 1 of
>>>'opal_atomic_add_32' differ in signedness
>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>'volatile int32_t *' but argument is of type 'uint32_t *'
>>> spml_yoda.c:725: error: signed and unsigned type in conditional
>>>expression
>>> spml_yoda.c: In function 'mca_spml_yoda_get':
>>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of
>>>'opal_atomic_add_32' differ in signedness
>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>'volatile int32_t *' but argument is of type 'uint32_t *'
>>> spml_yoda.c:1107: error: signed and unsigned type in conditional
>>>expression
>>> make[2]: *** [spml_yoda.lo] Error 1
>>> make[1]: *** [all-recursive] Error 1
>>>
>>> Only configure arguments:
>>>
>>> enable_picky=yes
>>> enable_debug=yes
>>>
>>>
>>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>>>
>>>
>>>
>>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" <bwbarre_at_[hidden]>
>>>wrote:
>>>
>>>> On 8/6/13 10:30 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>
>>>>> Dear OMPI Community,
>>>>>
>>>>> Please find on Bitbucket the latest round of OSHMEM changes based
>>>>> on community feedback. Please git and test at your leisure.
>>>>>
>>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>>>
>>>> Josh -
>>>>
>>>> In general, I think everything looks ok. However, the "right" thing
>>>>doesn't happen if the CM PML is used (at least, when using the
>>>>Portals
>>>>4
>>>> MTL). When configured with:
>>>>
>>>> ./configure
>>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>>>
>>>> The build segfaults trying to run a SHMEM program:
>>>>
>>>> mpirun -np 2 ./bcast
>>>> [shannon:90397] *** Process received signal *** [shannon:90397]
>>>> Signal: Segmentation fault (11) [shannon:90397] Signal code: Address
>>>> not mapped (1) [shannon:90397] Failing at address: (nil)
>>>> [shannon:90398] *** Process received signal *** [shannon:90398]
>>>> Signal: Segmentation fault (11) [shannon:90398] Signal code: Address
>>>> not mapped (1) [shannon:90398] Failing at address: (nil)
>>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>>>> [shannon:90397] *** End of error message *** [shannon:90398] [ 0]
>>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End of
>>>> error message ***
>>>>
>>>>---------------------------------------------------------------------
>>>>---
>>>>--
>>>> mpirun noticed that process rank 1 with PID 90398 on node shannon
>>>>exited on signal 11 (Segmentation fault).
>>>>
>>>>---------------------------------------------------------------------
>>>>---
>>>>--
>>>>
>>>>
>>>>
>>>> Brian
>>>>
>>>> --
>>>> Brian W. Barrett
>>>> Scalable System Software Group
>>>> Sandia National Laboratories
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>--
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
>
>
>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories


  • application/pkcs7-signature attachment: smime.p7s