Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: James Conway (jxc100+_at_[hidden])
Date: 2006-02-10 12:18:42


Brian et al,

Original thread was "[O-MPI users] Firewall ports and Mac OS X 10.4.4"

On Feb 9, 2006, at 11:26 PM, Brian Barrett wrote:

> Open MPI uses random port numbers for all it's communication.
> (etc)

Thanks for the explanation. I will live with the open Firewall, and
look at the ipfw docs for writing a script.

Now I have a more "core" OpenMPI problem, which may be just
unfamiliarity on my part. I seem to have the environment variables
set up alright though - the code runs, but doesn't complete.

I have copied the "MPI Tutorial: The cannonical ring program" from
<http://www.lam-mpi.org/tutorials/>. It compiles and runs fine on the
localhost (one CPU, one or more MPI processes). If I copy it to a
remotehost, it does one round of passing the 'tag' then stalls. I
modified the print statements a bit to see where in the code it
stalls, but the loop hasn't changed. This is what I see happening:
1. Process 0 successfully kicks off the pass-around by sending the
tag to the next process (1), and then enters the loop where it waits
for the tag to come back.
2. Process 1 enters the loop, receives the tag and passes it on (back
to process 0 since this is a ring of 2 players only).
3. Process 0 successfully receives the tag, decrements it, and calls
the next send (MPI_Send) but it doesn't return from this. I have a
print statement right after (with fflush) but there is no output. The
CPU is maxed out on both the local and remote hosts, I assume some
kind of polling.
4. Needless to say, Process 1 never reports receipt of the tag.

Output (with a little re-ordering to make sense) is:
    mpirun --hostfile my_mpi_hosts --np 2 mpi_test1
    Process rank 0: size = 2
    Process rank 1: size = 2
    Enter the number of times around the ring: 5

    Process 0 doing first send of '4' to 1
    Process 0 finished sending, now entering loop

    Process 0 waiting to receive from 1

    Process 1 waiting to receive from 0
    Process 1 received '4' from 0
    Process 1 sending '4' to 0
    Process 1 finished sending
    Process 1 waiting to receive from 0

    Process 0 received '4' from 1
>>Process 0 decremented num
    Process 0 sending '3' to 1
    !---- nothing more - hangs at 100% cpu until ctrl-
    !---- should see "Process 0 finished sending"

Since process 0 succeeds in calling MPI_Send before the loop, and in
calling MPI_Recv at the start of the loop, the communications appear
to be working. Likewise, process 1 succeeds in receiving and sending
within the loop. However, if its significant, these calls work one
time for each process - the second time MPI_Send is called by process
0, there is a hang.

I am using Mac OSX 10.4.4 and gcc 4.0.1 on both systems, with OpenMPI
1.0.1 installed (compiled from sources). The small tutorial code is
below (I hope its OK to include here), with the few printf mods that
I made.

Any pointers appreciated!

James Conway

----------------------------------------------------------------------
James Conway, PhD.,
Department of Structural Biology
University of Pittsburgh School of Medicine
Biomedical Science Tower 3, Room 2047
3501 5th Ave
Pittsburgh, PA 15260
U.S.A.
Phone: +1-412-383-9847
Fax: +1-412-648-8998
Email: jxc100_at_[hidden]
Web: <http://www.pitt.edu/~jxc100/> (under construction)
----------------------------------------------------------------------

/*
  * Open Systems Lab
  * http://www.lam-mpi.org/tutorials/
  * Indiana University
  *
  * MPI Tutorial
  * The cannonical ring program
  *
  * Mail questions regarding tutorial material to mpi_at_[hidden]
  */

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[]);

int main(int argc, char *argv[])
{
   MPI_Status status;
   int num, rank, size;

   /* Start up MPI */

   MPI_Init(&argc, &argv);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Comm_size(MPI_COMM_WORLD, &size);

/*
Arbitrarily choose 201 to be our tag. Calculate the
rank of the next process in the ring. Use the modulus
operator so that the last process "wraps around" to rank
zero.
*/

   const int tag = 201;
   const int next = (rank + 1) % size;
   const int from = (rank + size - 1) % size;

   printf("Process rank %d: size = %d\n", rank, size);

/*
If we are the "console" process, get an integer from the user
to specify how many times we want to go around the ring
*/

   if (rank == 0) {
     printf("Enter the number of times around the ring: ");
     scanf("%d", &num);
     --num;

     printf("Process %d doing first send of '%d' to %d\n", rank, num,
next);
     MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
     printf("Process %d finished sending, now entering loop\n", rank);
     fflush(stdout);
   }

/*
Pass the message around the ring. The exit mechanism works
as follows: the message (a positive integer) is passed
around the ring. Each time is passes rank 0, it is decremented.
When each processes receives the 0 message, it passes it on
to the next process and then quits. By passing the 0 first,
every process gets the 0 message and can quit normally.
*/

   while (1) {

     printf("Process %d waiting to receive from %d\n", rank, from);
     MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
     printf("Process %d received '%d' from %d\n", rank, num, from);
     fflush(stdout);

     if (rank == 0) {
       num--;
       printf(">>Process 0 decremented num\n");
       fflush(stdout);
     }

     printf("Process %d sending '%d' to %d\n", rank, num, next);
     MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
     printf("Process %d finished sending\n", rank);
     fflush(stdout);

     if (num == 0) {
       printf("Process %d exiting\n", rank);
       fflush(stdout);
       break;
     }
   }

// The last process does one extra send to process 0, which needs
// to be received before the program can exit

   printf("Process %d after loop\n", rank);
   fflush(stdout);

   if (rank == 0)
     MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);

// Quit

   MPI_Finalize();
   return 0;
}