Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-08-14 17:31:44


Can you point me to a test program that would exercise it? I'd like to give it a try first.

I'm okay with on by default as it builds its own separate library, and with the RFC

On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:

> Josh -
>
> In general, I don't have a strong opinion of whether OpenSHMEM is on by
> default or not. It might cause unexpected behavior for some users (like
> on Crays, where one should really use Cray's SHMEM), but maybe it's better
> on other platforms.
>
> I also would have no objection to the RFC, provided the segfaults I found
> get resolved.
>
> Brian
>
> On 8/14/13 2:08 PM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>
>> Ralph, and Brian
>>
>> Thanks a bunch for taking the time to review this. It is extremely
>> helpful. Let me comment of the building of OSHMEM and solicit some
>> feedback from you guys (along with the rest of the community.)
>> Originally we had planned to enable OSHMEM to build only if
>> '--with-oshmem' flag was passed at configure time. However, (unbeknownst
>> to me) this behavior was changed and now OSHMEM is built by default, i.e.
>> yes, Ralph this is the intended behavior now. I am wondering if this is
>> such a good idea. Do folks have a strong opinion on this one way or the
>> other? From my perspective I can see arguments for both sides of the
>> coin.
>>
>> Other than cleaning up warnings and resolving the segfault that Brian
>> observed are we on a good course to getting this upstream? Is it
>> reasonable to file an RFC for three weeks out?
>>
>> Josh
>>
>> -----Original Message-----
>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Barrett,
>> Brian W
>> Sent: Sunday, August 11, 2013 1:42 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>
>> Ralph -
>>
>> I think those warnings are just because of when they last synced with the
>> trunk; it looks like they haven't updated in the last week, when those
>> (and some usnic fixes) went in.
>>
>> More concerning is the --enable-picky stuff and the disabling of SHMEM in
>> the right places.
>>
>> Brian
>>
>> On 8/11/13 11:24 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>
>>> Turning off the enable_picky, I get it to compile with the following
>>> warnings:
>>>
>>> pget_elements_x_f.c:70: warning: no previous prototype for
>>> 'ompi_get_elements_x_f'
>>> pstatus_set_elements_x_f.c:70: warning: no previous prototype for
>>> 'ompi_status_set_elements_x_f'
>>> ptype_get_extent_x_f.c:69: warning: no previous prototype for
>>> 'ompi_type_get_extent_x_f'
>>> ptype_get_true_extent_x_f.c:69: warning: no previous prototype for
>>> 'ompi_type_get_true_extent_x_f'
>>> ptype_size_x_f.c:69: warning: no previous prototype for
>>> 'ompi_type_size_x_f'
>>>
>>> I also found that OpenShmem is still building by default. Is that
>>> intended? I thought you were only going to build if --with-shmem (or
>>> whatever option) was given.
>>>
>>> Looks like some cleanup is required
>>>
>>> On Aug 10, 2013, at 8:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>
>>>> FWIW, I couldn't get it to build - this is on a simple Xeon-based
>>>> system under CentOS 6.2:
>>>>
>>>> cc1: warnings being treated as errors
>>>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>>>> spml_yoda_getreq.c:98: error: pointer targets in passing argument 1
>>>> of 'opal_atomic_add_32' differ in signedness
>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>> 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>> spml_yoda_getreq.c:98: error: signed and unsigned type in conditional
>>>> expression
>>>> cc1: warnings being treated as errors
>>>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>>>> spml_yoda_putreq.c:81: error: pointer targets in passing argument 1
>>>> of 'opal_atomic_add_32' differ in signedness
>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>> 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>> spml_yoda_putreq.c:81: error: signed and unsigned type in conditional
>>>> expression
>>>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>>>> make[2]: *** Waiting for unfinished jobs....
>>>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>>>> cc1: warnings being treated as errors
>>>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>>>> spml_yoda.c:725: error: pointer targets in passing argument 1 of
>>>> 'opal_atomic_add_32' differ in signedness
>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>> 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>> spml_yoda.c:725: error: signed and unsigned type in conditional
>>>> expression
>>>> spml_yoda.c: In function 'mca_spml_yoda_get':
>>>> spml_yoda.c:1107: error: pointer targets in passing argument 1 of
>>>> 'opal_atomic_add_32' differ in signedness
>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note: expected
>>>> 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>> spml_yoda.c:1107: error: signed and unsigned type in conditional
>>>> expression
>>>> make[2]: *** [spml_yoda.lo] Error 1
>>>> make[1]: *** [all-recursive] Error 1
>>>>
>>>> Only configure arguments:
>>>>
>>>> enable_picky=yes
>>>> enable_debug=yes
>>>>
>>>>
>>>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>>>>
>>>>
>>>>
>>>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W" <bwbarre_at_[hidden]>
>>>> wrote:
>>>>
>>>>> On 8/6/13 10:30 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>>
>>>>>> Dear OMPI Community,
>>>>>>
>>>>>> Please find on Bitbucket the latest round of OSHMEM changes based
>>>>>> on community feedback. Please git and test at your leisure.
>>>>>>
>>>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>>>>
>>>>> Josh -
>>>>>
>>>>> In general, I think everything looks ok. However, the "right" thing
>>>>> doesn't happen if the CM PML is used (at least, when using the
>>>>> Portals
>>>>> 4
>>>>> MTL). When configured with:
>>>>>
>>>>> ./configure
>>>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>>>>
>>>>> The build segfaults trying to run a SHMEM program:
>>>>>
>>>>> mpirun -np 2 ./bcast
>>>>> [shannon:90397] *** Process received signal *** [shannon:90397]
>>>>> Signal: Segmentation fault (11) [shannon:90397] Signal code: Address
>>>>> not mapped (1) [shannon:90397] Failing at address: (nil)
>>>>> [shannon:90398] *** Process received signal *** [shannon:90398]
>>>>> Signal: Segmentation fault (11) [shannon:90398] Signal code: Address
>>>>> not mapped (1) [shannon:90398] Failing at address: (nil)
>>>>> [shannon:90397] [ 0] /lib64/libpthread.so.0() [0x38b7a0f4a0]
>>>>> [shannon:90397] *** End of error message *** [shannon:90398] [ 0]
>>>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End of
>>>>> error message ***
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> ---
>>>>> --
>>>>> mpirun noticed that process rank 1 with PID 90398 on node shannon
>>>>> exited on signal 11 (Segmentation fault).
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> ---
>>>>> --
>>>>>
>>>>>
>>>>>
>>>>> Brian
>>>>>
>>>>> --
>>>>> Brian W. Barrett
>>>>> Scalable System Software Group
>>>>> Sandia National Laboratories
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> --
>> Brian W. Barrett
>> Scalable System Software Group
>> Sandia National Laboratories
>>
>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> --
> Brian W. Barrett
> Scalable System Software Group
> Sandia National Laboratories
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel