Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Configuration problem or network problem?
From: Doug Reeder (dlr_at_[hidden])
Date: 2009-07-06 22:48:31


Lin,

Try -np 16 and not running on the head node.

Doug Reeder
On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:

> Hi all,
> The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as
> a headnode, they are connected by a high speed switch.
> There are point-to-point communication functions( MPI_Send and
> MPI_Recv ), the data size is about 40KB, and a lot of computings
> which will consume a long time(about 1 sec)in a loop.The co-
> processor in PS3 can take care of the computation, the main
> processor take care of point-to-point communication,so the computing
> and communication can overlap.The communication funtions should
> return much faster than computing function.
> My question is that after some circles, the time consumed by
> communication functions in a PS3 will increase heavily, and the
> whole cluster's sync state will corrupt.When I decrease the
> computing time, this situation just disappeare.I am very confused
> about this.
> I think there is a mechanism in OpenMPI that cause this case, does
> everyone get this situation before?
> I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
> something i should added?
> Lin
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users