Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] Question about collective messages implementation
From: Jerome Reybert (jreybert_at_[hidden])
Date: 2010-11-02 06:21:22


Hello,

I am using OpenMPI 1.4.2 and 1.5. I am working on a very large scientific
software. The source code is huge and I don't have lot of freedom in this code.
I can't even force the user to define a topology with mpirun.

At the moment, the software is using MPI in a very classical way: in a cluster,
one MPI task = one core on a machine => for example, 4 machines with 8 cores on
each, we run 32 MPI tasks. An hybrid OpenMP + MPI version is currently in
development, but we do not consider it for now.

At some points in the application, each task must call a Lapack function. Each
task call the same function, for the same data, in the same time, for the same
result. The idea here is:

  - on each machine, only one task call a Lapack function, an efficient
multi-thread or GPU version.
  - other tasks are waiting.
  - each machine is used at 100%, and the Lapack function should be ~ 8 times
more efficient.
  - then, the computation task should broadcast the result only for the tasks on
the local machine. In my cluster example, we should have 4 local broadcast,
without using the network at all.

For the moment, here my implementation:

void my_dpotrf_(char *uplo, int *len_uplo, double *a, int *lda, int *info) {
  MPI_Comm host_comm;
  int myrank, host_rank, size, host_id_len, color;
  char host_id[MPI_MAX_PROCESSOR_NAME];
  size_t n2 = *len_uplo * *len_uplo;
  
  MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
  MPI_Comm_size (MPI_COMM_WORLD, &size);
  MPI_Get_processor_name(host_id, &host_id_len);
  
  color = my_hash(host_id, host_id_len);
  MPI_Comm_split(MPI_COMM_WORLD, color, myrank, &host_comm);
  MPI_Comm_rank(host_comm, &host_rank);

  if (host_rank == 0) {
    efficient parallel Lapack function
  }
  MPI_Bcast ( a , n2, MPI_DOUBLE, 0, host_comm );
  MPI_Bcast ( info , 1, MPI_INT, 0, host_comm );
}

Each host_comm communicator is grouping tasks by machines. I ran this version,
but performances are worst than the current version (each task performing its
own Lapack function). I have several questions:

  - in my implementation, is MPI_Bcast aware that it should use shared memory
memory communication? Is data go through the network? It seems it is the case,
considering the first results.
  - is there any other methods to group task by machine, OpenMPI being aware
that it is grouping task by shared memory?
  - is it possible to assign a policy (in this case, a shared memory policy) to
a Bcast or a Barrier call?
  - do you have any better idea for this problem? :)

Regards,

-- 
Jerome Reybert