Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
From: George Bosilca (bosilca_at_[hidden])
Date: 2013-08-15 12:30:33


On Aug 15, 2013, at 18:06 , Joshua Ladd <joshual_at_[hidden]> wrote:

> Maybe this is a stupid question, but in this case (I believe this goes all the way back to our initial discussion on OSHMEM), how does one fall back onto send/recv semantics when the call is made at the SHMEM level to do a put?

The same way our current OSC (one-sided component) is falling back on the pt2pt component when no underlying BTL supports RDMA operation.

> If a BTL doesn't support RDMA, then it doesn't seem reasonable to expect OSHMEM to support it through YODA. It seems more reasonable to check whether or not the bml_get is NULL and if this is the case, then one must disqualify YODA and hence SHMEM. How can you support put /get SHMEM semantics without an RDMA equipped BTL? Does it even make sense to try to emulate that behavior? I know the SHMEM developers have been going round in circles on this, so any insight you could provide would be greatly appreciated.

If you want to provide SHMEM for single machine runs (for development purposes as an example) you will have to provide SHMEM on top of BTLs without RMA support. Our current SM BTL doesn't support RMA operations if KNEM or CMA are not available. Thus you will disqualify all machines without CMA/KNEM support as development machines for SHMEM based application (including all Mac OS X laptops).

  George.

>
> Josh
>
>
> -----Original Message-----
> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Ralph Castain
> Sent: Thursday, August 15, 2013 11:55 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>
> I see the problem. Yoda is directly calling bml_get without first checking to see if the bml_btl supports rdma operations. If you only have the tcp btl, then rdma isn't supported, the bml_get function is NULL, and you segfault.
>
> What you need to do is check for rdma, and then fall back to message-based transfers if rdma isn't available. I believe that's what our current PML's do - you can't just assume rdma (or any other support) is just present.
>
>
> On Aug 14, 2013, at 4:02 PM, Joshua Ladd <joshual_at_[hidden]> wrote:
>
>> Thanks, Ralph. We'll have a look. Admittedly, we've done little testing with the tcp BTL - I was under the impression that the yoda interface was capable of working with all BTLs, seems we need more testing. For sure it works with SM and OpenIB BTLs.
>>
>> Josh
>>
>> -----Original Message-----
>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Ralph
>> Castain
>> Sent: Wednesday, August 14, 2013 6:13 PM
>> To: Open MPI Developers
>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>
>> Here's the backtrace:
>>
>> (gdb) where
>> #0 0x0000000000000000 in ?? ()
>> #1 0x00007fac6b8d8921 in mca_bml_base_get (bml_btl=0x239a130,
>> des=0x220e880) at ../../../../ompi/mca/bml/bml.h:326
>> #2 0x00007fac6b8db767 in mca_spml_yoda_get (src_addr=0x601500,
>> size=4, dst_addr=0x7fff3b00b370, src=1) at spml_yoda.c:1091
>> #3 0x00007fac6f1ea56d in shmem_int_g (addr=0x601500, pe=1) at
>> shmem_g.c:47
>> #4 0x0000000000400bc7 in main ()
>>
>> On Aug 14, 2013, at 3:12 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>>> Hmmm...well, it works fine as long as the procs are on the same node. However, if they are on different nodes, it segfaults:
>>>
>>> [rhc_at_bend002 shmem]$ shmemrun -npernode 1 ./test_shmem running on
>>> bend001 running on bend002 [bend001:06590] *** Process received
>>> signal
>>> *** [bend001:06590] Signal: Segmentation fault (11) [bend001:06590]
>>> Signal code: Address not mapped (1) [bend001:06590] Failing at
>>> address: (nil) [bend001:06590] [ 0] /lib64/libpthread.so.0()
>>> [0x307d40f500] [bend001:06590] *** End of error message ***
>>> [bend002][[62090,1],1][btl_tcp_frag.c:219:mca_btl_tcp_frag_recv]
>>> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
>>> ---------------------------------------------------------------------
>>> -
>>> ---- shmemrun noticed that process rank 0 with PID 6590 on node
>>> bend001 exited on signal 11 (Segmentation fault).
>>> ---------------------------------------------------------------------
>>> -
>>> ----
>>>
>>> I would have thought it should work in that situation - yes?
>>>
>>>
>>> On Aug 14, 2013, at 2:52 PM, Joshua Ladd <joshual_at_[hidden]> wrote:
>>>
>>>> The following simple test code will exercise the following:
>>>>
>>>> start_pes()
>>>>
>>>> shmalloc()
>>>>
>>>> shmem_int_get()
>>>>
>>>> shmem_int_put()
>>>>
>>>> shmem_barrier_all()
>>>>
>>>> To compile:
>>>>
>>>> shmemcc test_shmem.c -o test_shmem
>>>>
>>>> To launch:
>>>>
>>>> shmemrun -np 2 test_shmem
>>>>
>>>> or for those who prefer to launch with SLURM
>>>>
>>>> srun -n 2 test_shmem
>>>>
>>>> Josh
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Ralph
>>>> Castain
>>>> Sent: Wednesday, August 14, 2013 5:32 PM
>>>> To: Open MPI Developers
>>>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>>>
>>>> Can you point me to a test program that would exercise it? I'd like to give it a try first.
>>>>
>>>> I'm okay with on by default as it builds its own separate library,
>>>> and with the RFC
>>>>
>>>> On Aug 14, 2013, at 2:03 PM, "Barrett, Brian W" <bwbarre_at_[hidden]> wrote:
>>>>
>>>>> Josh -
>>>>>
>>>>> In general, I don't have a strong opinion of whether OpenSHMEM is
>>>>> on by default or not. It might cause unexpected behavior for some
>>>>> users (like on Crays, where one should really use Cray's SHMEM),
>>>>> but maybe it's better on other platforms.
>>>>>
>>>>> I also would have no objection to the RFC, provided the segfaults I
>>>>> found get resolved.
>>>>>
>>>>> Brian
>>>>>
>>>>> On 8/14/13 2:08 PM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>>
>>>>>> Ralph, and Brian
>>>>>>
>>>>>> Thanks a bunch for taking the time to review this. It is extremely
>>>>>> helpful. Let me comment of the building of OSHMEM and solicit some
>>>>>> feedback from you guys (along with the rest of the community.)
>>>>>> Originally we had planned to enable OSHMEM to build only if
>>>>>> '--with-oshmem' flag was passed at configure time. However,
>>>>>> (unbeknownst to me) this behavior was changed and now OSHMEM is built by default, i.e.
>>>>>> yes, Ralph this is the intended behavior now. I am wondering if
>>>>>> this is such a good idea. Do folks have a strong opinion on this
>>>>>> one way or the other? From my perspective I can see arguments for
>>>>>> both sides of the coin.
>>>>>>
>>>>>> Other than cleaning up warnings and resolving the segfault that
>>>>>> Brian observed are we on a good course to getting this upstream?
>>>>>> Is it reasonable to file an RFC for three weeks out?
>>>>>>
>>>>>> Josh
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of
>>>>>> Barrett, Brian W
>>>>>> Sent: Sunday, August 11, 2013 1:42 PM
>>>>>> To: Open MPI Developers
>>>>>> Subject: Re: [OMPI devel] [EXTERNAL] OpenSHMEM round 2
>>>>>>
>>>>>> Ralph -
>>>>>>
>>>>>> I think those warnings are just because of when they last synced
>>>>>> with the trunk; it looks like they haven't updated in the last
>>>>>> week, when those (and some usnic fixes) went in.
>>>>>>
>>>>>> More concerning is the --enable-picky stuff and the disabling of
>>>>>> SHMEM in the right places.
>>>>>>
>>>>>> Brian
>>>>>>
>>>>>> On 8/11/13 11:24 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>>>>>
>>>>>>> Turning off the enable_picky, I get it to compile with the
>>>>>>> following
>>>>>>> warnings:
>>>>>>>
>>>>>>> pget_elements_x_f.c:70: warning: no previous prototype for
>>>>>>> 'ompi_get_elements_x_f'
>>>>>>> pstatus_set_elements_x_f.c:70: warning: no previous prototype for
>>>>>>> 'ompi_status_set_elements_x_f'
>>>>>>> ptype_get_extent_x_f.c:69: warning: no previous prototype for
>>>>>>> 'ompi_type_get_extent_x_f'
>>>>>>> ptype_get_true_extent_x_f.c:69: warning: no previous prototype
>>>>>>> for 'ompi_type_get_true_extent_x_f'
>>>>>>> ptype_size_x_f.c:69: warning: no previous prototype for
>>>>>>> 'ompi_type_size_x_f'
>>>>>>>
>>>>>>> I also found that OpenShmem is still building by default. Is that
>>>>>>> intended? I thought you were only going to build if --with-shmem
>>>>>>> (or whatever option) was given.
>>>>>>>
>>>>>>> Looks like some cleanup is required
>>>>>>>
>>>>>>> On Aug 10, 2013, at 8:54 PM, Ralph Castain <rhc_at_[hidden]> wrote:
>>>>>>>
>>>>>>>> FWIW, I couldn't get it to build - this is on a simple
>>>>>>>> Xeon-based system under CentOS 6.2:
>>>>>>>>
>>>>>>>> cc1: warnings being treated as errors
>>>>>>>> spml_yoda_getreq.c: In function 'mca_spml_yoda_get_completion':
>>>>>>>> spml_yoda_getreq.c:98: error: pointer targets in passing
>>>>>>>> argument
>>>>>>>> 1 of 'opal_atomic_add_32' differ in signedness
>>>>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note:
>>>>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>>>>> spml_yoda_getreq.c:98: error: signed and unsigned type in
>>>>>>>> conditional expression
>>>>>>>> cc1: warnings being treated as errors
>>>>>>>> spml_yoda_putreq.c: In function 'mca_spml_yoda_put_completion':
>>>>>>>> spml_yoda_putreq.c:81: error: pointer targets in passing
>>>>>>>> argument
>>>>>>>> 1 of 'opal_atomic_add_32' differ in signedness
>>>>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note:
>>>>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>>>>> spml_yoda_putreq.c:81: error: signed and unsigned type in
>>>>>>>> conditional expression
>>>>>>>> make[2]: *** [spml_yoda_getreq.lo] Error 1
>>>>>>>> make[2]: *** Waiting for unfinished jobs....
>>>>>>>> make[2]: *** [spml_yoda_putreq.lo] Error 1
>>>>>>>> cc1: warnings being treated as errors
>>>>>>>> spml_yoda.c: In function 'mca_spml_yoda_put_internal':
>>>>>>>> spml_yoda.c:725: error: pointer targets in passing argument 1 of
>>>>>>>> 'opal_atomic_add_32' differ in signedness
>>>>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note:
>>>>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>>>>> spml_yoda.c:725: error: signed and unsigned type in conditional
>>>>>>>> expression
>>>>>>>> spml_yoda.c: In function 'mca_spml_yoda_get':
>>>>>>>> spml_yoda.c:1107: error: pointer targets in passing argument 1
>>>>>>>> of 'opal_atomic_add_32' differ in signedness
>>>>>>>> ../../../../opal/include/opal/sys/amd64/atomic.h:174: note:
>>>>>>>> expected 'volatile int32_t *' but argument is of type 'uint32_t *'
>>>>>>>> spml_yoda.c:1107: error: signed and unsigned type in conditional
>>>>>>>> expression
>>>>>>>> make[2]: *** [spml_yoda.lo] Error 1
>>>>>>>> make[1]: *** [all-recursive] Error 1
>>>>>>>>
>>>>>>>> Only configure arguments:
>>>>>>>>
>>>>>>>> enable_picky=yes
>>>>>>>> enable_debug=yes
>>>>>>>>
>>>>>>>>
>>>>>>>> gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-3)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Aug 10, 2013, at 7:21 PM, "Barrett, Brian W"
>>>>>>>> <bwbarre_at_[hidden]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 8/6/13 10:30 AM, "Joshua Ladd" <joshual_at_[hidden]> wrote:
>>>>>>>>>
>>>>>>>>>> Dear OMPI Community,
>>>>>>>>>>
>>>>>>>>>> Please find on Bitbucket the latest round of OSHMEM changes
>>>>>>>>>> based on community feedback. Please git and test at your leisure.
>>>>>>>>>>
>>>>>>>>>> https://bitbucket.org/jladd_math/mlnx-oshmem.git
>>>>>>>>>
>>>>>>>>> Josh -
>>>>>>>>>
>>>>>>>>> In general, I think everything looks ok. However, the "right"
>>>>>>>>> thing doesn't happen if the CM PML is used (at least, when
>>>>>>>>> using the Portals
>>>>>>>>> 4
>>>>>>>>> MTL). When configured with:
>>>>>>>>>
>>>>>>>>> ./configure
>>>>>>>>> --enable-mca-no-build=pml-ob1,pml-bfo,pml-v,btl,bml,mpool
>>>>>>>>>
>>>>>>>>> The build segfaults trying to run a SHMEM program:
>>>>>>>>>
>>>>>>>>> mpirun -np 2 ./bcast
>>>>>>>>> [shannon:90397] *** Process received signal *** [shannon:90397]
>>>>>>>>> Signal: Segmentation fault (11) [shannon:90397] Signal code:
>>>>>>>>> Address not mapped (1) [shannon:90397] Failing at address:
>>>>>>>>> (nil) [shannon:90398] *** Process received signal ***
>>>>>>>>> [shannon:90398]
>>>>>>>>> Signal: Segmentation fault (11) [shannon:90398] Signal code:
>>>>>>>>> Address not mapped (1) [shannon:90398] Failing at address:
>>>>>>>>> (nil) [shannon:90397] [ 0] /lib64/libpthread.so.0()
>>>>>>>>> [0x38b7a0f4a0] [shannon:90397] *** End of error message ***
>>>>>>>>> [shannon:90398] [ 0]
>>>>>>>>> /lib64/libpthread.so.0() [0x38b7a0f4a0] [shannon:90398] *** End
>>>>>>>>> of error message ***
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> --
>>>>>>>>> ---
>>>>>>>>> ---
>>>>>>>>> --
>>>>>>>>> mpirun noticed that process rank 1 with PID 90398 on node
>>>>>>>>> shannon exited on signal 11 (Segmentation fault).
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------
>>>>>>>>> -
>>>>>>>>> --
>>>>>>>>> ---
>>>>>>>>> ---
>>>>>>>>> --
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Brian
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Brian W. Barrett
>>>>>>>>> Scalable System Software Group
>>>>>>>>> Sandia National Laboratories
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> devel mailing list
>>>>>>>>> devel_at_[hidden]
>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Brian W. Barrett
>>>>>> Scalable System Software Group
>>>>>> Sandia National Laboratories
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Brian W. Barrett
>>>>> Scalable System Software Group
>>>>> Sandia National Laboratories
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> <test_shmem.c>_______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel