Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Comm_accept randomly gives errors
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-10-15 09:41:31


Yeah, we don't support multi-threaded operations very well at this time. I think you'd have better success with the 1.7 series as it is released, but very much doubt the 1.6 series could do this as you describe.

One way to solve the immediate problem would be to funnel all MPI operations into a single thread - you can have that thread subsequently parcel out any messages for handling. You'd have better success with it.

On Oct 3, 2012, at 10:36 PM, Valentin Clement <valentin.clement_at_[hidden]> wrote:

> Hi everyone,
>
> I'm currently implementing communication based on MPI in our parallel language middle-ware POP-C++. It was using TCP/IP socket before but due to a project to port the language on a supercomputer, I have to use OpenMPI for the communication. I successfully change the old communication by MPI communication. Anyway I having the following error sometimes during the execution of my program.
>
> MPI-COMBOX(client): Want to get a connection to 3461939200.0;tcp://172.19.76.219:52876;tcp://172.19.7.128:52876;tcp://172.16.162.1:52876;tcp://192.168.59.1:52876+3461939202.0;tcp://172.19.76.219:52879;tcp://172.19.7.128:52879;tcp://172.16.162.1:52879;tcp://192.168.59.1:52879:300
> [clementon:58465] [[52825,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
> [clementon:58465] *** An error occurred in MPI_Comm_accept
> [clementon:58465] *** on communicator MPI_COMM_WORLD
> [clementon:58465] *** MPI_ERR_UNKNOWN: unknown error
> [clementon:58465] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>
> Sometimes I have to MPI_Comm_connect that failed :
>
> MPI-COMBOX(client): Want to get a connection to 1318912000.0;tcp://192.168.59.176:33956+1318912002.0;tcp://192.168.59.176:54394:300
> [ubuntu:19666] [[20125,3],0] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
> [ubuntu:19666] *** An error occurred in MPI_Comm_accept
> [ubuntu:19666] *** on communicator MPI_COMM_WORLD
> [ubuntu:19666] *** MPI_ERR_UNKNOWN: unknown error
> [ubuntu:19666] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort
>
> So basically, I have a process waiting for connection with MPI_Comm_accept (Comm.Accept as I used C++). And another process want to connect to it with the MPI_Comm_connect (MPI::COMM_WORLD.Connect(port_name) ... ). It works fine most of the time. I'm suspecting a problem with multiple threads. The process who receives connection as a second thread to serve request.
>
> * The process 1 connects to the process 2
> * process 2 thread 1 register the request
> * process 2 thread 1 will wait for a new connection
> * process 2 thread 2 will server the pending request and might send data
> * A another process might start again a connection to the process 2
>
> I'm running this code on an Ubuntu 12.04 with OpenMPI 1.6.2 configured with --enable-mpi-thread-multiple. I joined ompi_info -all output.
> I'm running also the same code on a Mac OS X 10.8.2 with OpenMPI 1.6.2 also configured with --enable-mpi-thread-multiple.
>
> I don't run on multiple node for the moment. Just one node and already experiencing this. As I said I'm suspecting a problem with multiple thread but my configuration should allow multiple thread to use MPI calls.
>
>
>
> Any help much appreciated
>
>
>
> Valentin Clement
>
> --
> Valentin Clement
> Student trainee
> Advanced Institute for Computational Science
> Programming environnement research team
> RIKEN Institute
> Kobe, Japan
>
>
> <ompi-output.tar.bz2>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users