Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] weird problem with passing data between nodes
From: Mattijs Janssens (m.janssens_at_[hidden])
Date: 2008-06-13 05:27:10


Sounds like a typical deadlock situation. All processors are waiting for one
another.

Not a specialist but from what I know if the messages are small enough they'll
be offloaded to kernel/hardware and there is no deadlock. That why it might
work for small messages and/or certain mpi implementations.

Solutions:
- come up with a global communication schedule such that if one processor
sends the receiver is receiving.
- use mpi_bsend. Might be slower.
- use mpi_isend, mpi_irecv (but then you'll have to make sure the buffers stay
valid for the duration of the communication)

On Friday 13 June 2008 01:55, zach wrote:
> I have a weird problem that shows up when i use LAM or OpenMPI but not
> MPICH.
>
> I have a parallelized code working on a really large matrix. It
> partitions the matrix column-wise and ships them off to processors,
> so, any given processor is working on a matrix with the same number of
> rows as the original but reduced number of columns. Each processor
> needs to send a single column vector entry
> from its own matrix to the adjacent processor and visa versa as part
> of the algorithm.
>
> I have found that depending on the number of rows of the matrix -or,
> the size of the vector being sent using MPI_Send, MPI_Recv, the
> simulation will hang.
> It is only until i reduce this dimension to a certain max number will
> the sim run properly. I have also found that this magic number differs
> depending on the system I am using, eg my home quad-core box or remote
> cluster.
>
> As i mentioned i have not had this issue with mpich. I would like to
> understand why it is happening rather than just defect over to mpich
> to get by.
>
> Any help would be appreciated!
> zach
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users