Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] CUDA support not working?
From: Jörg Bornschein (jb_at_[hidden])
Date: 2013-11-24 11:30:22


On 24.11.2013, at 10:22, Ralph Castain <rhc_at_[hidden]> wrote:

> The cuda support in the 1.7 series has been evolving - a number of patches have been applied since 1.7.3 was released, and I see another (for optimization) scheduled.
>
> You might try the 1.7.4 nightly tarball and see if the problem has been fixed.

Same problem with 1.7.4-nightly.

But I compiled and started my little test program on a machine with actual Infiniband hardware
and the problem disappeared! I guess on machines with Inifniband hardware OB1 is not
selected at runtime? Is this correct?

I still believe that ompi/mca/pml/ob1/* is not linked to common_cuda.*, although it
should. I’m slightly overwhelmed by automake, so I don’t know how to add this
reference and try it myself..

    j

>
> On Nov 24, 2013, at 7:11 AM, Jörg Bornschein <jb_at_[hidden]> wrote:
>
>> On 23.11.2013, at 22:56, Dmitry N. Mikushin <maemarcus_at_[hidden]> wrote:
>>
>>> VT is getting out of sync with CUDA from time to time, this already
>>> happened before.
>>
>> Yes, thats what I thought and thats why I didn’t mention it as my main issue.
>>
>>
>>
>> I’m rather stuck because cuda support and ob1 don’t seem to fit together — at least on my systems.
>>
>>
>> j
>>
>>
>>
>>> - D.
>>>
>>>
>>> 2013/11/24 Jörg Bornschein <jb_at_[hidden]>:
>>>> On 23.11.2013, at 21:42, Jörg Bornschein <jb_at_[hidden]> wrote:
>>>>
>>>> Sorry,
>>>>
>>>>> I’m typically compiling with
>>>>>
>>>>> ./configure —with-cuda
>>>>
>>>>
>>>> I’m actually compiling with
>>>>
>>>> ./configure —with-cuda —disable-vt
>>>>
>>>> because otherwise I get a compile time error:
>>>>
>>>> make[5]: Entering directory `/u/bornj/software-old/src/openmpi-1.7.3/ompi/contrib/vt/vt/vtlib'
>>>> CC libvt_la-vt_cudart.lo
>>>> CC libvt_mpi_la-vt_pform_linux.lo
>>>> CC libvt_mpi_la-vt_thrd.lo
>>>> CC libvt_mpi_la-vt_trc.lo
>>>> CC libvt_mpi_la-vt_user_comment.lo
>>>> CC libvt_mpi_la-vt_user_control.lo
>>>> CC libvt_mpi_la-vt_user_count.lo
>>>> CC libvt_mpi_la-vt_user_marker.lo
>>>> vt_cudart.c: In function 'cudaLaunch':
>>>> vt_cudart.c:2725:15: error: 'vt_cupti_events_enabled' undeclared (first use in this function)
>>>> vt_cudart.c:2725:15: note: each undeclared identifier is reported only once for each function it appears in
>>>>
>>>>
>>>>
>>>> j
>>>>
>>>>
>>>>
>>>>> but I tried combining it with various other options. OMPI builds fine, but when I try to run programs compiled against it I always get:
>>>>>
>>>>> /a.out: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
>>>>>
>>>>> That error even seems to make sense, because the code in ompi/mca/pml/ob1/ refers to common_cuda.[ch], but it does not
>>>>> seem to link against it's dynamic binary.
>>>>>
>>>>> Am I missing something?
>>>>>
>>>>>
>>>>> Thanks!
>>>>>
>>>>>
>>>>> jb
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel