Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] CUDA support doesn't work starting from 1.9a1r27862
From: Alessandro Fanfarillo (fanfarillo.openmpi_at_[hidden])
Date: 2013-01-24 08:31:38


I usually run "mpirun -np 2 ./test". I execute always on a single node. The
message appears either with 1 or 2 GPUs on the single node.

2013/1/24 Rolf vandeVaart <rvandevaart_at_[hidden]>

> Thanks for this report. I will look into this. Can you tell me what your
> mpirun command looked like and do you know what transport you are running
> over?****
>
> Specifically, is this on a single node or multiple nodes?****
>
> ** **
>
> Rolf****
>
> ** **
>
> *From:* devel-bounces_at_[hidden] [mailto:devel-bounces_at_[hidden]] *On
> Behalf Of *Alessandro Fanfarillo
> *Sent:* Thursday, January 24, 2013 4:11 AM
> *To:* devel_at_[hidden]
> *Subject:* [OMPI devel] CUDA support doesn't work starting from
> 1.9a1r27862****
>
> ** **
>
> Dear all,****
>
> I would like to report a bug for the CUDA support on the last 5 trunk
> versions.****
>
> The attached code is a simply send/receive test case which correctly works
> with version 1.9a1r27844. ****
>
> Starting from version 1.9a1r27862 up to 1.9a1r27897 I get the following
> message:
>
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> ./test: symbol lookup error:
> /usr/local/openmpi/lib/openmpi/mca_pml_ob1.so: undefined symbol:
> progress_one_cuda_htod_event
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 21641 on
> node ip-10-16-24-100 exiting improperly. There are three reasons this
> could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> 3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
> orte_create_session_dirs is set to false. In this case, the run-time cannot
> detect that the abort call was an abnormal termination. Hence, the only
> error message you will receive is this one.
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
> You can avoid this message by specifying -quiet on the mpirun command line.
> ****
>
>
>
> -----------------------------------------------------------------------------------------------------
> ****
>
> I'm using gcc-4.7.2 and CUDA 4.2. The test fails also with CUDA 4.1.****
>
> Thanks in advance.
>
> Best regards.
>
> Alessandro Fanfarillo****
>
> ** **
>
> ** **
> ------------------------------
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential information. Any unauthorized review, use,
> disclosure or distribution is prohibited. If you are not the intended
> recipient, please contact the sender by reply email and destroy all copies
> of the original message.
> ------------------------------
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>