have you tried IMB benchmark with Bcast,
I think the problem is in the app.
All ranks in the communicator should enter Bcast,
since you have
if (rank==0)
else state, not all of them enters the same flow.
if (iRank == 0)
{
iLength = sizeof (acMessage);
MPI_Bcast (&iLength, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast (acMessage, iLength, MPI_CHAR, 0, MPI_COMM_WORLD);
printf ("Process 0: Message sent\n");
}
else
{
MPI_Bcast (&iLength, 1, MPI_INT, 0, MPI_COMM_WORLD);
pMessage = (char *) malloc (iLength);
MPI_Bcast (pMessage, iLength, MPI_CHAR, 0, MPI_COMM_WORLD);
printf ("Process %d: %s\n", iRank, pMessage);
}
Lenny.
On Mon, Jan 4, 2010 at 8:23 AM, Eugene Loh
<Eugene.Loh@sun.com> wrote:
If you're willing to try some stuff:
1) What about "-mca coll_sync_barrier_before 100"? (The default may be
1000. So, you can try various values less than 1000. I'm suggesting
100.) Note that broadcast has somewhat one-way traffic flow, which can
have some undesirable flow control issues.
2) What about "-mca btl_sm_num_fifos 16"? Default is 1. If the
problem is trac ticket 2043, then this suggestion can help.
P.S. There's a memory leak, right? The receive buffer is being
allocated over and over again. Might not be that closely related to
the problem you see here, but at a minimum it's bad style.
Louis Rossi wrote:
I am
having a problem with BCast hanging on a dual quad core Opteron (2382,
2.6GHz, Quad Core, 4 x 512KB L2, 6MB L3 Cache) system running FC11 with
openmpi-1.4. The LD_LIBRARY_PATH and PATH variables are correctly
set. I have used the FC11 rpm distribution of openmpi and built
openmpi-1.4 locally with the same results. The problem was first
observed in a larger reliable CFD code, but I can create the problem
with a simple demo code (attached). The code attempts to execute 2000
pairs of broadcasts.
The hostfile contains a single line
<machinename> slots=8
If I run it with 4 cores or fewer, the code will run fine.
If I run it with 5 cores or more, it will hang some of the time after
successfully executing several hundred broadcasts. The number varies
from run to run. The code usually finishes with 5 cores. The
probability of hanging seems to increase with the number of nodes. The
syntax I use is simple.
mpiexec -machinefile hostfile -np 5 bcast_example
There was some discussion of a similar problem on the user list, but I
could not find a resolution. I have tried setting the processor
affinity (--mca mpi_paffinity_alone 1). I have tried varying the
broadcast algorithm (--mca coll_tuned_bcast_algorithm 1-6). I have
also tried excluding (-mca oob_tcp_if_exclude) my eth1 interface (see
ifconfig.txt attached) which is not connected to anything. None of
these changed the outcome.
Any thoughts or suggestions would be appreciated.
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users