Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Programming help required in Interleaving computation+communication !
From: Atle Rudshaug (atle_at_[hidden])
Date: 2009-11-25 06:47:58


souvik bhattacherjee wrote:
> Hi all,
>
> I'm trying to interleave computation with communication. As a result,
> I have resorted to using MPI with POSIX threads. Primarily, I am
> trying to communicate a partial vector v3 while computing an inner
> product v1*v2 (mod q). To give you an idea of the platform and the
> libraries:
> 1. Intel dual-socket quadcore m/c (total 8 cores/machine)
> 2. openmpi 1.3.3 (separate installations on ict6 and ict4 machines)
> 3. lib64gmp3 4.3.1
> 4. gcc 4.3.2
> 5. interconnect: Gigabit ethernet
>
> I have used a single thread for most of the communication and the
> remaining 7 threads for computation. Perhaps, this portion of the code
> has gone wrong somewhere and the program terminates with the following
> error message.
>
> $ mpicc test-vecvecmul.c -lgmp -pthread -Wall -o tvmul
>
> $ mpirun --prefix /usr/local/openmpi-1.3.3/ -np 2 --host ict6,ict4 ./tvmul
>
> [err] event_queue_remove: 0xc1d6b0(fd 10) not on queue 8
> [err] event_queue_remove: 0xc1d6b0(fd 10) not on queue 8
> [ict6][[21545,1],0][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 17154 on
> node ict4 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
>
> The code is attached along with. Please suggest where in the code have
> I gone wrong. Also, a more efficient way of interleaving (if exists)
> is something that I am interested in.
>
> **** Can anyone suggest a good tutorial sort of thing where I can get
> to know about programming in MPI with POSIX threads/OpenMP.
>
> Regards,
> --
> Souvik
>
I got a similar error when using non-blocking communication on large
datasets. I eventually had to switch to blocking communication... Try to
make the code work with blocking communication first and see if that
removes your error, then re-implement it from there with non-blocking
again. Doesn't MPI have decent threading performance if the processes
are located on the same node? Could you perhaps use MPI only?

- Atle