Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Multi-threading with OpenMPI ?
From: Ashika Umanga Umagiliya (aumanga_at_[hidden])
Date: 2009-09-16 21:37:39


Any tips ? Anyone ? :(

Ashika Umanga Umagiliya wrote:
> One more modification , I do not call MPI_Finalize() from the
> "libParallel.so" library.
>
> Ashika Umanga Umagiliya wrote:
>> Greetings all,
>>
>> After some reading , I found out that I have to build openMPI using
>> "--enable-mpi-threads"
>> After thatm I changed MPI_INIT() code in my "libParallel.so" and in
>> "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to :
>>
>> int sup;
>> MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);
>>
>> Now when multiple requests comes (multiple threads) MPI gives
>> following two errors:
>>
>> "<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data
>> unpack would read past end of buffer in file dpm_orte.c at line
>> 299</stddiag>
>> [umanga:6127] *** An error occurred in MPI_Comm_spawn
>> [umanga:6127] *** on communicator MPI_COMM_SELF
>> [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
>> [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv
>> failed: Connection reset by peer (104)
>> --------------------------------------------------------------------------
>>
>> mpirun has exited due to process rank 0 with PID 6127 on
>> node umanga exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> "
>>
>> or sometimes :
>>
>> "[umanga:5477] *** An error occurred in MPI_Comm_spawn
>> [umanga:5477] *** on communicator MPI_COMM_SELF
>> [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
>> [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> <stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data
>> unpack would read past end of buffer in file dpm_orte.c at line
>> 299</stddiag>
>> --------------------------------------------------------------------------
>>
>> mpirun has exited due to process rank 0 with PID 5477 on
>> node umanga exiting without calling "finalize". This may
>> have caused other processes in the application to be
>> terminated by signals sent by mpirun (as reported here).
>> --------------------------------------------------------------------------"
>>
>>
>>
>> Any tips ?
>>
>> Thank you
>>
>> Ashika Umanga Umagiliya wrote:
>>> Greetings all,
>>>
>>> Please refer to image at:
>>> http://i27.tinypic.com/mtqurp.jpg
>>>
>>> Here the process illustrated in the image:
>>>
>>> 1) C++ Webservice loads the "libParallel.so" when it starts up.
>>> (dlopen)
>>> 2) When a new request comes from a client,*new thread* is created,
>>> SOAP data is bound to C++ objects and calcRisk() method of
>>> webservice invoked.Inside this method, "calcRisk()" of "libParallel"
>>> is invoked (using dlsym ..etc)
>>> 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr"
>>> MPI application.
>>> (I am using boost MPI and boost serializarion to send
>>> custom-data-types across spawned processes.)
>>> 4) "parallel-svr" (MPI Application in image) execute the parallel
>>> logic and send the result back to "libParallel.so" using boost MPI
>>> send..etc.
>>> 5) "libParallel.so" send the result to webservice,bind into SOAP and
>>> sent result to client and the thread ends.
>>>
>>> My problem is :
>>>
>>> Everthing works fine for the first request from the client,
>>> For the second request it throws an error (i assume from
>>> libParallel.so") saying:
>>>
>>> "--------------------------------------------------------------------------
>>>
>>> Calling any MPI-function after calling MPI_Finalize is erroneous.
>>> The only exceptions are MPI_Initialized, MPI_Finalized and
>>> MPI_Get_version.
>>> --------------------------------------------------------------------------
>>>
>>> *** An error occurred in MPI_Init
>>> *** after MPI was finalized
>>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>>> [umanga:19390] Abort after MPI_FINALIZE completed successfully; not
>>> able to guarantee that all other processes were killed!"
>>>
>>>
>>> Is this because of multithreading ? Any idea how to fix this ?
>>>
>>> Thanks in advance,
>>> umanga
>>>
>>
>