This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a
headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and
MPI_Recv ), the data size is about 40KB, and a lot of computings which
will consume a long time(about 1 sec)in a loop.The co-processor in PS3
can take care of the computation, the main processor take care of
point-to-point communication,so the computing and communication can
overlap.The communication funtions should return much faster than
My question is that after some circles, the time consumed by
communication functions in a PS3 will increase heavily, and the whole
cluster's sync state will corrupt.When I decrease the computing time,
this situation just disappeare.I am very confused about this.
I think there is a mechanism in OpenMPI that cause this case, does
everyone get this situation before?
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
something i should added?