Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Multi-threading with OpenMPI ?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-09-16 21:53:30


Only the obvious, and not very helpful one: comm_spawn isn't thread
safe at this time. You'll need to serialize your requests to that
function.

I believe the thread safety constraints within OMPI are discussed to
some extent on the FAQ site. At the least, they have been discussed in
some depth on this mailing list several times. Might be some further
nuggets of advice on workarounds in there.

On Sep 16, 2009, at 7:37 PM, Ashika Umanga Umagiliya wrote:

> Any tips ? Anyone ? :(
>
>
> Ashika Umanga Umagiliya wrote:
>> One more modification , I do not call MPI_Finalize() from the
>> "libParallel.so" library.
>>
>> Ashika Umanga Umagiliya wrote:
>>> Greetings all,
>>>
>>> After some reading , I found out that I have to build openMPI
>>> using "--enable-mpi-threads"
>>> After thatm I changed MPI_INIT() code in my "libParallel.so" and
>>> in "parallel-svr" (please refer to http://i27.tinypic.com/
>>> mtqurp.jpg ) to :
>>>
>>> int sup;
>>> MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);
>>>
>>> Now when multiple requests comes (multiple threads) MPI gives
>>> following two errors:
>>>
>>> "<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG:
>>> Data unpack would read past end of buffer in file dpm_orte.c at
>>> line 299</stddiag>
>>> [umanga:6127] *** An error occurred in MPI_Comm_spawn
>>> [umanga:6127] *** on communicator MPI_COMM_SELF
>>> [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
>>> [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv:
>>> readv failed: Connection reset by peer (104)
>>> --------------------------------------------------------------------------
>>> mpirun has exited due to process rank 0 with PID 6127 on
>>> node umanga exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> "
>>>
>>> or sometimes :
>>>
>>> "[umanga:5477] *** An error occurred in MPI_Comm_spawn
>>> [umanga:5477] *** on communicator MPI_COMM_SELF
>>> [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
>>> [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> <stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data
>>> unpack would read past end of buffer in file dpm_orte.c at line
>>> 299</stddiag>
>>> --------------------------------------------------------------------------
>>> mpirun has exited due to process rank 0 with PID 5477 on
>>> node umanga exiting without calling "finalize". This may
>>> have caused other processes in the application to be
>>> terminated by signals sent by mpirun (as reported here).
>>> --------------------------------------------------------------------------"
>>>
>>>
>>> Any tips ?
>>>
>>> Thank you
>>>
>>> Ashika Umanga Umagiliya wrote:
>>>> Greetings all,
>>>>
>>>> Please refer to image at:
>>>> http://i27.tinypic.com/mtqurp.jpg
>>>>
>>>> Here the process illustrated in the image:
>>>>
>>>> 1) C++ Webservice loads the "libParallel.so" when it starts up.
>>>> (dlopen)
>>>> 2) When a new request comes from a client,*new thread* is
>>>> created, SOAP data is bound to C++ objects and calcRisk() method
>>>> of webservice invoked.Inside this method, "calcRisk()" of
>>>> "libParallel" is invoked (using dlsym ..etc)
>>>> 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr"
>>>> MPI application.
>>>> (I am using boost MPI and boost serializarion to send custom-data-
>>>> types across spawned processes.)
>>>> 4) "parallel-svr" (MPI Application in image) execute the parallel
>>>> logic and send the result back to "libParallel.so" using boost
>>>> MPI send..etc.
>>>> 5) "libParallel.so" send the result to webservice,bind into SOAP
>>>> and sent result to client and the thread ends.
>>>>
>>>> My problem is :
>>>>
>>>> Everthing works fine for the first request from the client,
>>>> For the second request it throws an error (i assume from
>>>> libParallel.so") saying:
>>>>
>>>> "--------------------------------------------------------------------------
>>>> Calling any MPI-function after calling MPI_Finalize is erroneous.
>>>> The only exceptions are MPI_Initialized, MPI_Finalized and
>>>> MPI_Get_version.
>>>> --------------------------------------------------------------------------
>>>> *** An error occurred in MPI_Init
>>>> *** after MPI was finalized
>>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>>> [umanga:19390] Abort after MPI_FINALIZE completed successfully;
>>>> not able to guarantee that all other processes were killed!"
>>>>
>>>>
>>>> Is this because of multithreading ? Any idea how to fix this ?
>>>>
>>>> Thanks in advance,
>>>> umanga
>>>>
>>>
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users