Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Multi-threading with OpenMPI ?
From: Ashika Umanga Umagiliya (aumanga_at_[hidden])
Date: 2009-09-13 22:54:28


One more modification , I do not call MPI_Finalize() from the
"libParallel.so" library.

Ashika Umanga Umagiliya wrote:
> Greetings all,
>
> After some reading , I found out that I have to build openMPI using
> "--enable-mpi-threads"
> After thatm I changed MPI_INIT() code in my "libParallel.so" and in
> "parallel-svr" (please refer to http://i27.tinypic.com/mtqurp.jpg ) to :
>
> int sup;
> MPI_Init_thread(NULL,NULL,MPI_THREAD_MULTIPLE,&sup);
>
> Now when multiple requests comes (multiple threads) MPI gives
> following two errors:
>
> "<stddiag rank="0">[umanga:06127] [[8004,1],0] ORTE_ERROR_LOG: Data
> unpack would read past end of buffer in file dpm_orte.c at line
> 299</stddiag>
> [umanga:6127] *** An error occurred in MPI_Comm_spawn
> [umanga:6127] *** on communicator MPI_COMM_SELF
> [umanga:6127] *** MPI_ERR_UNKNOWN: unknown error
> [umanga:6127] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [umanga:06126] [[8004,0],0]-[[8004,1],0] mca_oob_tcp_msg_recv: readv
> failed: Connection reset by peer (104)
> --------------------------------------------------------------------------
>
> mpirun has exited due to process rank 0 with PID 6127 on
> node umanga exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> "
>
> or sometimes :
>
> "[umanga:5477] *** An error occurred in MPI_Comm_spawn
> [umanga:5477] *** on communicator MPI_COMM_SELF
> [umanga:5477] *** MPI_ERR_UNKNOWN: unknown error
> [umanga:5477] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> <stddiag rank="0">[umanga:05477] [[7630,1],0] ORTE_ERROR_LOG: Data
> unpack would read past end of buffer in file dpm_orte.c at line
> 299</stddiag>
> --------------------------------------------------------------------------
>
> mpirun has exited due to process rank 0 with PID 5477 on
> node umanga exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------"
>
>
>
> Any tips ?
>
> Thank you
>
> Ashika Umanga Umagiliya wrote:
>> Greetings all,
>>
>> Please refer to image at:
>> http://i27.tinypic.com/mtqurp.jpg
>>
>> Here the process illustrated in the image:
>>
>> 1) C++ Webservice loads the "libParallel.so" when it starts up. (dlopen)
>> 2) When a new request comes from a client,*new thread* is created,
>> SOAP data is bound to C++ objects and calcRisk() method of webservice
>> invoked.Inside this method, "calcRisk()" of "libParallel" is invoked
>> (using dlsym ..etc)
>> 3) Inside "calcRisk()" of "libParallel" ,it spawns "parallel-svr" MPI
>> application.
>> (I am using boost MPI and boost serializarion to send
>> custom-data-types across spawned processes.)
>> 4) "parallel-svr" (MPI Application in image) execute the parallel
>> logic and send the result back to "libParallel.so" using boost MPI
>> send..etc.
>> 5) "libParallel.so" send the result to webservice,bind into SOAP and
>> sent result to client and the thread ends.
>>
>> My problem is :
>>
>> Everthing works fine for the first request from the client,
>> For the second request it throws an error (i assume from
>> libParallel.so") saying:
>>
>> "--------------------------------------------------------------------------
>>
>> Calling any MPI-function after calling MPI_Finalize is erroneous.
>> The only exceptions are MPI_Initialized, MPI_Finalized and
>> MPI_Get_version.
>> --------------------------------------------------------------------------
>>
>> *** An error occurred in MPI_Init
>> *** after MPI was finalized
>> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>> [umanga:19390] Abort after MPI_FINALIZE completed successfully; not
>> able to guarantee that all other processes were killed!"
>>
>>
>> Is this because of multithreading ? Any idea how to fix this ?
>>
>> Thanks in advance,
>> umanga
>>
>