Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] CUDA support not working?
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-11-24 12:08:13


On Nov 24, 2013, at 8:30 AM, Jörg Bornschein <jb_at_[hidden]> wrote:

> On 24.11.2013, at 10:22, Ralph Castain <rhc_at_[hidden]> wrote:
>
>> The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled.
>>
>> You might try the 1.7.4 nightly tarball and see if the problem has been fixed.
>
>
> Same problem with 1.7.4-nightly.
>
> But I compiled and started my little test program on a machine with actual Infiniband hardware
> and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not
> selected at runtime? Is this correct?

Sounds like a bug to me - if cuda is being used, we need to select ob1 regardless. I'll have to let Rolf figure that one out.

>
>
> I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it
> should. I’m slightly overwhelmed by automake, so I don’t know how to add this
> reference and try it myself..

Try the attached - should fix the problem.

>
> j
>
>
>
>
>>
>> On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <jb_at_[hidden]> wrote:
>>
>>> On 23.11.2013, at 22:56, Dmitry N. Mikushin <maemarcus_at_[hidden]> wrote:
>>>
>>>> VT is getting out of sync with CUDA from time to time, this already
>>>> happened before.
>>>
>>> Yes, thats what I thought and thats why I didn’t mention it as my main issue.
>>>
>>>
>>>
>>> I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems.
>>>
>>>
>>> j
>>>
>>>
>>>
>>>> - D.
>>>>
>>>>
>>>> 2013/11/24 Jörg Bornschein <jb_at_[hidden]>:
>>>>> On 23.11.2013, at 21:42, Jörg Bornschein <jb_at_[hidden]> wrote:
>>>>>
>>>>> Sorry,
>>>>>
>>>>>> I’m typically compiling with
>>>>>>
>>>>>> ./configure —with-cuda
>>>>>
>>>>>
>>>>> I’m actually compiling with
>>>>>
>>>>> ./configure —with-cuda —disable-vt
>>>>>
>>>>> because otherwise I get a compile time error:
>>>>>
>>>>> make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib'
>>>>> CC libvt_la-vt_cudart.lo
>>>>> CC libvt_mpi_la-vt_pform_linux.lo
>>>>> CC libvt_mpi_la-vt_thrd.lo
>>>>> CC libvt_mpi_la-vt_trc.lo
>>>>> CC libvt_mpi_la-vt_user_comment.lo
>>>>> CC libvt_mpi_la-vt_user_control.lo
>>>>> CC libvt_mpi_la-vt_user_count.lo
>>>>> CC libvt_mpi_la-vt_user_marker.lo
>>>>> vt_cudart.c: In function 'cudaLaunch':
>>>>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function)
>>>>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in
>>>>>
>>>>>
>>>>>
>>>>> j
>>>>>
>>>>>
>>>>>
>>>>>> but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get:
>>>>>>
>>>>>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
>>>>>>
>>>>>> That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not
>>>>>> seem to link against it's dynamic binary.
>>>>>>
>>>>>> Am I missing something?
>>>>>>
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>> jb
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/octet-stream attachment: pml.diff