Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Cuda Aware MPI Problem
From: Rolf vandeVaart (rvandevaart_at_[hidden])
Date: 2013-12-13 08:21:03


Yes, this was a bug with Open MPI 1.7.3. I could not reproduce it, but it was definitely an issue in certain configurations.
Here was the fix. https://svn.open-mpi.org/trac/ompi/changeset/29762

We fixed it in Open MPI 1.7.4 and the trunk version, so as you have seen, they do not have the problem.

Rolf


From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Özgür Pekçagliyan
Sent: Friday, December 13, 2013 8:03 AM
To: users_at_[hidden]
Subject: Re: [OMPI users] Cuda Aware MPI Problem

Hello again,

I have compiled openmpi--1.9a1r29873 from nightly build trunk and so far everything looks alright. But I have not test the cuda support yet.

On Fri, Dec 13, 2013 at 2:38 PM, Özgür Pekçağlıyan <ozgur.pekcagliyan_at_[hidden]<mailto:ozgur.pekcagliyan_at_[hidden]>> wrote:
Hello,

I am having difficulties with compiling openMPI with CUDA support. I have followed this (http://www.open-mpi.org/faq/?category=building#build-cuda) faq entry. As below;

$ cd openmpi-1.7.3/
$ ./configure --with-cuda=/urs/local/cuda-5.5
$ make all install

everything goes perfect during compilation. But when I try to execute simplest mpi hello world application I got following error;

$ mpicc hello.c -o hello
$ mpirun -np 2 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 30329 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--------------------------------------------------------------------------

$ mpirun -np 1 hello

hello: symbol lookup error: /usr/local/lib/openmpi/mca_pml_ob1.so: undefined symbol: progress_one_cuda_htod_event
--------------------------------------------------------------------------
mpirun has exited due to process rank 0 with PID 30327 on
node cudalab1 exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.

--------------------------------------------------------------------------


Any suggestions?
I have two PCs with Intel I3 CPUs and Geforce GTX 480 GPUs.


And here is the hello.c file;
#include <stdio.h>
#include <mpi.h>


int main (int argc, char **argv)
{
  int rank, size;

  MPI_Init (&argc, &argv); /* starts MPI */
  MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */
  MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */
  printf( "Hello world from process %d of %d\n", rank, size );
  MPI_Finalize();
  return 0;
}



--
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering



--
Özgür Pekçağlıyan
B.Sc. in Computer Engineering
M.Sc. in Computer Engineering

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------