Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
From: Joshua Ladd (joshual_at_[hidden])
Date: 2013-08-14 16:08:17


Ralph, and Brian

Thanks a bunch for taking the time to review this. It is extremely helpful. Let me comment of the building of OSHMEM and solicit some feedback from you guys (along with the rest of the community.) Originally we had planned to enable OSHMEM to build only if '--with-oshmem' flag was passed at configure time. However, (unbeknownst to me) this behavior was changed and now OSHMEM is built by default, i.e. yes, Ralph this is the intended behavior now. I am wondering if this is such a good idea. Do folks have a strong opinion on this one way or the other? From my perspective I can see arguments for both sides of the coin.

Other than cleaning up warnings and resolving the segfault that Brian observed are we on a good course to getting this upstream? Is it reasonable to file an RFC for three weeks out?

Josh
 
-----Original Message-----
From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Barrett, Brian W
Sent: Sunday, August 11, 2013 1:42 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2

Ralph -

I think those warnings are just because of when they last synced with the trunk; it looks like they haven't updated in the last week, when those (and some usnic fixes) went in.

More concerning is the --enable-picky stuff and the disabling of SHMEM in the right places.

Brian

On 8/11/13 11:24 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:

>Turning off the enable_picky, I get it to compile with the following
>warnings:
>
>pget_elements_x_f.c:70: warning: no previous prototype for
>'ompi_get_elements_x_f'
>pstatus_set_elements_x_f.c:70: warning: no previous prototype for
>'ompi_status_set_elements_x_f'
>ptype_get_extent_x_f.c:69: warning: no previous prototype for
>'ompi_type_get_extent_x_f'
>ptype_get_true_extent_x_f.c:69: warning: no previous prototype for
>'ompi_type_get_true_extent_x_f'
>ptype_size_x_f.c:69: warning: no previous prototype for
>'ompi_type_size_x_f'
>
>I also found that OpenShmem is still building by default. Is that
>intended? I thought you were only going to build if --with-shmem (or
>whatever option) was given.
>
>Looks like some cleanup is required
>
>On Aug 10, 2013, at 8:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> FWIW, I couldn't get it to build - this is on a simple Xeon-based
>>system under CentOS 6.2:
>>
>> cc1: warnings being treated as errors
>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1
>>of 'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda_getreq.c:98: error: signed and unsigned type in conditional
>>expression
>> cc1: warnings being treated as errors
>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1
>>of 'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda_putreq.c:81: error: signed and unsigned type in conditional
>>expression
>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>> make[2]: *** Waiting for unfinished jobs....
>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>> cc1: warnings being treated as errors
>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>> spml_yoda.c:725: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda.c:725: error: signed and unsigned type in conditional
>>expression
>> spml_yoda.c: In function 'mca_spml_yoda_get':
>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of
>>'opal_atomic_add_32' differ in signedness
>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>'volatile int32_t *' but argument is of type 'uint32_t *'
>> spml_yoda.c:1107: error: signed and unsigned type in conditional
>>expression
>> make[2]: *** [spml_yoda.lo] Error 1
>> make[1]: *** [all-recursive] Error 1
>>
>> Only configure arguments:
>>
>> enable_picky=yes
>> enable_debug=yes
>>
>>
>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>>
>>
>>
>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" <bwbarre_at_[hidden]>
>>wrote:
>>
>>> On 8/6/13 10:30 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>
>>>> Dear OMPI Community,
>>>>
>>>> Please find on Bitbucket the latest round of OSHMEM changes based
>>>> on community feedback. Please git and test at your leisure.
>>>>
>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>>
>>> Josh -
>>>
>>> In general, I think everything looks ok. However, the "right" thing
>>>doesn't happen if the CM PML is used (at least, when using the
>>>Portals
>>>4
>>> MTL). When configured with:
>>>
>>> ./configure
>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>>
>>> The build segfaults trying to run a SHMEM program:
>>>
>>> mpirun -np 2 ./bcast
>>> [shannon:90397] *** Process received signal *** [shannon:90397]
>>> Signal: Segmentation fault (11) [shannon:90397] Signal code: Address
>>> not mapped (1) [shannon:90397] Failing at address: (nil)
>>> [shannon:90398] *** Process received signal *** [shannon:90398]
>>> Signal: Segmentation fault (11) [shannon:90398] Signal code: Address
>>> not mapped (1) [shannon:90398] Failing at address: (nil)
>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>>> [shannon:90397] *** End of error message *** [shannon:90398] [ 0]
>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End of
>>> error message ***
>>>
>>>---------------------------------------------------------------------
>>>---
>>>--
>>> mpirun noticed that process rank 1 with PID 90398 on node shannon
>>>exited on signal 11 (Segmentation fault).
>>>
>>>---------------------------------------------------------------------
>>>---
>>>--
>>>
>>>
>>>
>>> Brian
>>>
>>> --
>>> Brian W. Barrett
>>> Scalable System Software Group
>>> Sandia National Laboratories
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>_______________________________________________
>devel mailing list
>devel_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories
_______________________________________________
devel mailing list
devel_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/devel