Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with groups and communicators in openmpi-1.6.4rc2
From: George Bosilca (bosilca_at_[hidden])
Date: 2013-01-19 10:19:02


On Jan 19, 2013, at 15:44 , Ralph Castain <rhc_at_[hidden]> wrote:

> I used your test code to confirm it also fails on our trunk - it looks like someone got the reference count wrong when creating/destructing groups.

No, the code is not MPI compliant.

The culprit is line 254 in the test code where Siegmar manually copied the group_comm_world into group_worker. This is correct as long as you remember that group_worker is not directly an MPI generated group, and as a result you are not allowed to free it.

Now if you replace the:

group_worker = group_comm_world

by an MPI operation that create a copy of the original group such as

MPI_Comm_group (MPI_COMM_WORLD, &group_worker);

your code become MPI valid, and works without any issue in Open MPI.

  George.

>
> Afraid I'll have to defer to the authors of that code area...
>
>
> On Jan 19, 2013, at 1:27 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>
>> Hi
>>
>> I have installed openmpi-1.6.4rc2 and have the following problem.
>>
>> tyr strided_vector 110 ompi_info | grep "Open MPI:"
>> Open MPI: 1.6.4rc2r27861
>> tyr strided_vector 111 mpicc -showme
>> gcc -I/usr/local/openmpi-1.6.4_64_gcc/include -fexceptions -pthread -m64
>> -L/usr/local/openmpi-1.6.4_64_gcc/lib64 -lmpi -lm -lkstat -llgrp -lsocket -lnsl
>> -lrt -lm
>>
>>
>> tyr strided_vector 112 mpiexec -np 4 data_type_4
>> Process 2 of 4 running on tyr.informatik.hs-fulda.de
>> Process 0 of 4 running on tyr.informatik.hs-fulda.de
>> Process 3 of 4 running on tyr.informatik.hs-fulda.de
>> Process 1 of 4 running on tyr.informatik.hs-fulda.de
>>
>> original matrix:
>>
>> 1 2 3 4 5 6 7 8 9 10
>> 11 12 13 14 15 16 17 18 19 20
>> 21 22 23 24 25 26 27 28 29 30
>> 31 32 33 34 35 36 37 38 39 40
>> 41 42 43 44 45 46 47 48 49 50
>> 51 52 53 54 55 56 57 58 59 60
>>
>> result matrix:
>> elements are sqared in columns:
>> 0 1 2 6 7
>> elements are multiplied with 2 in columns:
>> 3 4 5 8 9
>>
>> 1 4 9 8 10 12 49 64 18 20
>> 121 144 169 28 30 32 289 324 38 40
>> 441 484 529 48 50 52 729 784 58 60
>> 961 1024 1089 68 70 72 1369 1444 78 80
>> 1681 1764 1849 88 90 92 2209 2304 98 100
>> 2601 2704 2809 108 110 112 3249 3364 118 120
>>
>> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
>> )->obj_magic_id, file ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c
>> , line 412
>> [tyr:18578] *** Process received signal ***
>> [tyr:18578] Signal: Abort (6)
>> [tyr:18578] Signal code: (-1)
>> Assertion failed: OPAL_OBJ_MAGIC_ID == ((opal_object_t *) (comm->c_remote_group)
>> )->obj_magic_id, file ../../openmpi-1.6.4rc2r27861/ompi/communicator/comm_init.c
>> , line 412
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:opal_backtr
>> ace_print+0x20
>> [tyr:18580] *** Process received signal ***
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0x2c1bc4
>> [tyr:18580] Signal: Abort (6)
>> [tyr:18580] Signal code: (-1)
>> /lib/sparcv9/libc.so.1:0xd88a4
>> /lib/sparcv9/libc.so.1:0xcc418
>> /lib/sparcv9/libc.so.1:0xcc624
>> /lib/sparcv9/libc.so.1:__lwp_kill+0x8 [ Signal 6 (ABRT)]
>> /lib/sparcv9/libc.so.1:abort+0xd0
>> /lib/sparcv9/libc.so.1:_assert+0x74
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa4c58
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:0xa2430
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_comm_f
>> inalize+0x168
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:ompi_mpi_fi
>> nalize+0xa60
>> /export2/prog/SunOS_sparc/openmpi-1.6.4_64_gcc/lib64/libmpi.so.1.0.7:MPI_Finaliz
>> e+0x90
>> /home/fd1026/SunOS/sparc/bin/data_type_4:main+0x588
>> /home/fd1026/SunOS/sparc/bin/data_type_4:_start+0x7c
>> [tyr:18578] *** End of error message ***
>> ...
>>
>>
>>
>> Everything works fine with LAM-MPI (even in a heterogeneous environment
>> with little-endian and big-endian machines) so that it is probably an
>> error in Open MPI (but you never know).
>>
>>
>> tyr strided_vector 125 mpicc -showme
>> gcc -I/usr/local/lam-6.5.9_64_gcc/include -L/usr/local/lam-6.5.9_64_gcc/lib
>> -llamf77mpi -lmpi -llam -lsocket -lnsl
>> tyr strided_vector 126 lamboot -v hosts.lam-mpi
>>
>> LAM 6.5.9/MPI 2 C++ - Indiana University
>>
>> Executing hboot on n0 (tyr.informatik.hs-fulda.de - 2 CPUs)...
>> Executing hboot on n1 (sunpc1.informatik.hs-fulda.de - 4 CPUs)...
>> topology done
>>
>> tyr strided_vector 127 mpirun -v app_data_type_4.lam-mpi
>> 22894 data_type_4 running on local
>> 22895 data_type_4 running on n0 (o)
>> 21998 data_type_4 running on n1
>> 22896 data_type_4 running on n0 (o)
>> Process 1 of 4 running on tyr.informatik.hs-fulda.de
>> Process 3 of 4 running on tyr.informatik.hs-fulda.de
>> Process 2 of 4 running on sunpc1
>> Process 0 of 4 running on tyr.informatik.hs-fulda.de
>>
>> original matrix:
>>
>> 1 2 3 4 5 6 7 8 9 10
>> 11 12 13 14 15 16 17 18 19 20
>> 21 22 23 24 25 26 27 28 29 30
>> 31 32 33 34 35 36 37 38 39 40
>> 41 42 43 44 45 46 47 48 49 50
>> 51 52 53 54 55 56 57 58 59 60
>>
>> result matrix:
>> elements are sqared in columns:
>> 0 1 2 6 7
>> elements are multiplied with 2 in columns:
>> 3 4 5 8 9
>>
>> 1 4 9 8 10 12 49 64 18 20
>> 121 144 169 28 30 32 289 324 38 40
>> 441 484 529 48 50 52 729 784 58 60
>> 961 1024 1089 68 70 72 1369 1444 78 80
>> 1681 1764 1849 88 90 92 2209 2304 98 100
>> 2601 2704 2809 108 110 112 3249 3364 118 120
>>
>> tyr strided_vector 128 lamhalt
>>
>> LAM 6.5.9/MPI 2 C++ - Indiana University
>>
>>
>>
>> I would be grateful, if somebody could fix the problem. Thank you
>> very much for any help in advance.
>>
>>
>> Kind regards
>>
>> Siegmar
>> /* The program demonstrates how to set up and use a strided vector.
>> * The process with rank 0 creates a matrix. The columns of the
>> * matrix will then be distributed with a collective communication
>> * operation to all processes. Each process performs an operation on
>> * all column elements. Afterwards the results are collected in the
>> * source matrix overwriting the original column elements.
>> *
>> * The program uses between one and n processes to change the values
>> * of the column elements if the matrix has n columns. If you start
>> * the program with one process it has to work on all n columns alone
>> * and if you start it with n processes each process modifies the
>> * values of one column. Every process must know how many columns it
>> * has to modify so that it can allocate enough buffer space for its
>> * column block. Therefore the process with rank 0 computes the
>> * numbers of columns for each process in the array "num_columns" and
>> * distributes this array with MPI_Broadcast to all processes. Each
>> * process can now allocate memory for its column block. There is
>> * still one task to do before the columns of the matrix can be
>> * distributed with MPI_Scatterv: The size of every column block and
>> * the offset of every column block must be computed und stored in
>> * the arrays "sr_counts" and "sr_disps".
>> *
>> * An MPI data type is defined by its size, its contents, and its
>> * extent. When multiple elements of the same size are used in a
>> * contiguous manner (e.g. in a "scatter" operation or an operation
>> * with "count" greater than one) the extent is used to compute where
>> * the next element will start. The extent for a derived data type is
>> * as big as the size of the derived data type so that the first
>> * elements of the second structure will start after the last element
>> * of the first structure, i.e., you have to "resize" the new data
>> * type if you want to send it multiple times (count > 1) or to
>> * scatter/gather it to many processes. Restrict the extent of the
>> * derived data type for a strided vector in such a way that it looks
>> * like just one element if it is used with "count > 1" or in a
>> * scatter/gather operation.
>> *
>> * This version constructs a new column type (strided vector) with
>> * "MPI_Type_vector" and uses collective communication. The new
>> * data type knows the number of elements within one column and the
>> * spacing between two column elements. The program uses at most
>> * n processes if the matrix has n columns, i.e. depending on the
>> * number of processes each process receives between 1 and n columns.
>> * You can execute this program with an arbitrary number of processes
>> * because it creates its own group with "num_worker" (<= n) processes
>> * to perform the work if the matrix has n columns and the basic group
>> * contains too many processes.
>> *
>> *
>> * Compiling:
>> * Store executable(s) into local directory.
>> * mpicc -o <program name> <source code file name>
>> *
>> * Store executable(s) into predefined directories.
>> * make
>> *
>> * Make program(s) automatically on all specified hosts. You must
>> * edit the file "make_compile" and specify your host names before
>> * you execute it.
>> * make_compile
>> *
>> * Running:
>> * LAM-MPI:
>> * mpiexec -boot -np <number of processes> <program name>
>> * or
>> * mpiexec -boot \
>> * -host <hostname> -np <number of processes> <program name> : \
>> * -host <hostname> -np <number of processes> <program name>
>> * or
>> * mpiexec -boot [-v] -configfile <application file>
>> * or
>> * lamboot [-v] [<host file>]
>> * mpiexec -np <number of processes> <program name>
>> * or
>> * mpiexec [-v] -configfile <application file>
>> * lamhalt
>> *
>> * OpenMPI:
>> * "host1", "host2", and so on can all have the same name,
>> * if you want to start a virtual computer with some virtual
>> * cpu's on the local host. The name "localhost" is allowed
>> * as well.
>> *
>> * mpiexec -np <number of processes> <program name>
>> * or
>> * mpiexec --host <host1,host2,...> \
>> * -np <number of processes> <program name>
>> * or
>> * mpiexec -hostfile <hostfile name> \
>> * -np <number of processes> <program name>
>> * or
>> * mpiexec -app <application file>
>> *
>> * Cleaning:
>> * local computer:
>> * rm <program name>
>> * or
>> * make clean_all
>> * on all specified computers (you must edit the file "make_clean_all"
>> * and specify your host names before you execute it.
>> * make_clean_all
>> *
>> *
>> * File: data_type_4.c Author: S. Gross
>> * Date: 30.08.2012
>> *
>> */
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include "mpi.h"
>>
>> #define P 6 /* # of rows */
>> #define Q 10 /* # of columns */
>> #define FACTOR 2 /* multiplicator for col. elem. */
>> #define DEF_NUM_WORKER Q /* # of workers, must be <= Q */
>>
>> /* define macro to test the result of a "malloc" operation */
>> #define TestEqualsNULL(val) \
>> if (val == NULL) \
>> { \
>> fprintf (stderr, "file: %s line %d: Couldn't allocate memory.\n", \
>> __FILE__, __LINE__); \
>> exit (EXIT_FAILURE); \
>> }
>>
>> /* define macro to determine the minimum of two values */
>> #define MIN(a,b) ((a) < (b) ? (a) : (b))
>>
>>
>> static void print_matrix (int p, int q, double **mat);
>>
>>
>> int main (int argc, char *argv[])
>> {
>> int ntasks, /* number of parallel tasks */
>> mytid, /* my task id */
>> namelen, /* length of processor name */
>> i, j, /* loop variables */
>> *num_columns, /* # of columns in column block */
>> *sr_counts, /* send/receive counts */
>> *sr_disps, /* send/receive displacements */
>> tmp, tmp1; /* temporary values */
>> double matrix[P][Q],
>> **col_block; /* column block of matrix */
>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> MPI_Datatype column_t, /* column type (strided vector) */
>> col_block_t,
>> tmp_column_t; /* needed to resize the extent */
>> MPI_Group group_comm_world, /* processes in "basic group" */
>> group_worker, /* processes in new groups */
>> group_other;
>> MPI_Comm COMM_WORKER, /* communicators for new groups */
>> COMM_OTHER;
>> int num_worker, /* # of worker in "group_worker"*/
>> *group_w_mem, /* array of worker members */
>> group_w_ntasks, /* # of tasks in "group_worker" */
>> group_o_ntasks, /* # of tasks in "group_other" */
>> group_w_mytid, /* my task id in "group_worker" */
>> group_o_mytid, /* my task id in "group_other" */
>> *universe_size_ptr, /* ptr to # of "virtual cpu's" */
>> universe_size_flag; /* true if available */
>>
>> MPI_Init (&argc, &argv);
>> MPI_Comm_rank (MPI_COMM_WORLD, &mytid);
>> MPI_Comm_size (MPI_COMM_WORLD, &ntasks);
>> /* Determine the correct number of processes for this program. If
>> * there are more than Q processes (i.e., more processes than
>> * columns) available, we split the "basic group" into two groups.
>> * This program uses a group "group_worker" to do the real work
>> * and a group "group_other" for the remaining processes of the
>> * "basic group". The latter have nothing to do and can terminate
>> * immediately. If there are less than or equal to Q processes
>> * available all processes belong to group "group_worker" and group
>> * "group_other" is empty. At first we find out which processes
>> * belong to the "basic group".
>> */
>> MPI_Comm_group (MPI_COMM_WORLD, &group_comm_world);
>> if (ntasks > Q)
>> {
>> /* There are too many processes, so that we must build a new group
>> * with "num_worker" processes. "num_worker" will be the minimum of
>> * DEF_NUM_WORKER and the "universe size" if it is supported by the
>> * MPI implementation. At first we must check if DEF_NUM_WORKER has
>> * a suitable value.
>> */
>> if (DEF_NUM_WORKER > Q)
>> {
>> if (mytid == 0)
>> {
>> fprintf (stderr, "\nError:\tInternal program error.\n"
>> "\tConstant DEF_NUM_WORKER has value %d but must be\n"
>> "\tlower than or equal to %d. Please change source\n"
>> "\tcode and compile the program again.\n\n",
>> DEF_NUM_WORKER, Q);
>> }
>> MPI_Group_free (&group_comm_world);
>> MPI_Finalize ();
>> exit (EXIT_FAILURE);
>> }
>> /* determine the universe size, set "num_worker" in an
>> * appropriate way, and allocate memory for the array containing
>> * the ranks of the members of the new group
>> */
>> MPI_Comm_get_attr (MPI_COMM_WORLD, MPI_UNIVERSE_SIZE,
>> &universe_size_ptr, &universe_size_flag);
>> if ((universe_size_flag != 0) && (*universe_size_ptr > 0))
>> {
>> num_worker = MIN (DEF_NUM_WORKER, *universe_size_ptr);
>> }
>> else
>> {
>> num_worker = DEF_NUM_WORKER;
>> }
>> group_w_mem = (int *) malloc (num_worker * sizeof (int));
>> TestEqualsNULL (group_w_mem); /* test if memory was available */
>> if (mytid == 0)
>> {
>> printf ("\nYou have started %d processes but I need at most "
>> "%d processes.\n"
>> "The universe contains %d \"virtual cpu's\" (\"0\" means "
>> "not supported).\n"
>> "I build a new worker group with %d processes. The "
>> "processes with\n"
>> "the following ranks in the basic group belong to "
>> "the new group:\n ",
>> ntasks, Q, *universe_size_ptr, num_worker);
>> }
>> for (i = 0; i < num_worker; ++i)
>> {
>> /* fetch some ranks from the basic group for the new worker
>> * group, e.g. the last num_worker ranks to demonstrate that
>> * a process may have different ranks in different groups
>> */
>> group_w_mem[i] = (ntasks - num_worker) + i;
>> if (mytid == 0)
>> {
>> printf ("%d ", group_w_mem[i]);
>> }
>> }
>> if (mytid == 0)
>> {
>> printf ("\n\n");
>> }
>> /* Create group "group_worker" */
>> MPI_Group_incl (group_comm_world, num_worker, group_w_mem,
>> &group_worker);
>> free (group_w_mem);
>> }
>> else
>> {
>> /* there are at most as many processes as columns in our matrix,
>> * i.e., we can use the "basic group"
>> */
>> group_worker = group_comm_world;
>> }
>> /* Create group "group_other" which demonstrates only how to use
>> * another group operation and which has nothing to do in this
>> * program.
>> */
>> MPI_Group_difference (group_comm_world, group_worker,
>> &group_other);
>> MPI_Group_free (&group_comm_world);
>> /* Create communicators for both groups. The communicator is only
>> * defined for all processes of the group and it is undefined
>> * (MPI_COMM_NULL) for all other processes.
>> */
>> MPI_Comm_create (MPI_COMM_WORLD, group_worker, &COMM_WORKER);
>> MPI_Comm_create (MPI_COMM_WORLD, group_other, &COMM_OTHER);
>>
>>
>> /* =========================================================
>> * ====== ======
>> * ====== Supply work for all different groups. ======
>> * ====== ======
>> * ====== ======
>> * ====== At first you must find out if a process ======
>> * ====== belongs to a special group. You can use ======
>> * ====== MPI_Group_rank for this purpose. It returns ======
>> * ====== the rank of the calling process in the ======
>> * ====== specified group or MPI_UNDEFINED if the ======
>> * ====== calling process is not a member of the ======
>> * ====== group. ======
>> * ====== ======
>> * =========================================================
>> */
>>
>>
>> /* =========================================================
>> * ====== This is the group "group_worker". ======
>> * =========================================================
>> */
>> MPI_Group_rank (group_worker, &group_w_mytid);
>> if (group_w_mytid != MPI_UNDEFINED)
>> {
>> MPI_Comm_size (COMM_WORKER, &group_w_ntasks); /* # of processes */
>> /* Now let's start with the real work */
>> MPI_Get_processor_name (processor_name, &namelen);
>> /* With the next statement every process executing this code will
>> * print one line on the display. It may happen that the lines will
>> * get mixed up because the display is a critical section. In general
>> * only one process (mostly the process with rank 0) will print on
>> * the display and all other processes will send their messages to
>> * this process. Nevertheless for debugging purposes (or to
>> * demonstrate that it is possible) it may be useful if every
>> * process prints itself.
>> */
>> fprintf (stdout, "Process %d of %d running on %s\n",
>> group_w_mytid, group_w_ntasks, processor_name);
>> fflush (stdout);
>> MPI_Barrier (COMM_WORKER); /* wait for all other processes */
>>
>> /* Build the new type for a strided vector and resize the extent
>> * of the new datatype in such a way that the extent of the whole
>> * column looks like just one element so that the next column
>> * starts in matrix[0][i] in MPI_Scatterv/MPI_Gatherv.
>> */
>> MPI_Type_vector (P, 1, Q, MPI_DOUBLE, &tmp_column_t);
>> MPI_Type_create_resized (tmp_column_t, 0, sizeof (double),
>> &column_t);
>> MPI_Type_commit (&column_t);
>> MPI_Type_free (&tmp_column_t);
>> if (group_w_mytid == 0)
>> {
>> tmp = 1;
>> for (i = 0; i < P; ++i) /* initialize matrix */
>> {
>> for (j = 0; j < Q; ++j)
>> {
>> matrix[i][j] = tmp++;
>> }
>> }
>> printf ("\n\noriginal matrix:\n\n");
>> print_matrix (P, Q, (double **) matrix);
>> }
>> /* allocate memory for array containing the number of columns of a
>> * column block for each process
>> */
>> num_columns = (int *) malloc (group_w_ntasks * sizeof (int));
>> TestEqualsNULL (num_columns); /* test if memory was available */
>>
>> /* do an unnecessary initialization to make the GNU compiler happy
>> * so that you won't get a warning about the use of a possibly
>> * uninitialized variable
>> */
>> sr_counts = NULL;
>> sr_disps = NULL;
>> if (group_w_mytid == 0)
>> {
>> /* allocate memory for arrays containing the size and
>> * displacement of each column block
>> */
>> sr_counts = (int *) malloc (group_w_ntasks * sizeof (int));
>> TestEqualsNULL (sr_counts);
>> sr_disps = (int *) malloc (group_w_ntasks * sizeof (int));
>> TestEqualsNULL (sr_disps);
>> /* compute number of columns in column block for each process */
>> tmp = Q / group_w_ntasks;
>> for (i = 0; i < group_w_ntasks; ++i)
>> {
>> num_columns[i] = tmp; /* number of columns */
>> }
>> for (i = 0; i < (Q % group_w_ntasks); ++i) /* adjust size */
>> {
>> num_columns[i]++;
>> }
>> for (i = 0; i < group_w_ntasks; ++i)
>> {
>> /* nothing to do because "column_t" contains already all
>> * elements of a column, i.e., the "size" is equal to the
>> * number of columns in the block
>> */
>> sr_counts[i] = num_columns[i]; /* "size" of column-block */
>> }
>> sr_disps[0] = 0; /* start of i-th column-block */
>> for (i = 1; i < group_w_ntasks; ++i)
>> {
>> sr_disps[i] = sr_disps[i - 1] + sr_counts[i - 1];
>> }
>> }
>> /* inform all processes about their column block sizes */
>> MPI_Bcast (num_columns, group_w_ntasks, MPI_INT, 0, COMM_WORKER);
>> /* allocate memory for a column block and define a new derived
>> * data type for the column block. This data type is possibly
>> * different for different processes if the number of processes
>> * isn't a factor of the row size of the original matrix. Don't
>> * forget to resize the extent of the new data type in such a
>> * way that the extent of the whole column looks like just one
>> * element so that the next column starts in col_block[0][i]
>> * in MPI_Scatterv/MPI_Gatherv.
>> */
>> col_block = (double **) malloc (P * num_columns[group_w_mytid] *
>> sizeof (double));
>> TestEqualsNULL (col_block);
>> MPI_Type_vector (P, 1, num_columns[group_w_mytid], MPI_DOUBLE,
>> &tmp_column_t);
>> MPI_Type_create_resized (tmp_column_t, 0, sizeof (double),
>> &col_block_t);
>> MPI_Type_commit (&col_block_t);
>> MPI_Type_free (&tmp_column_t);
>> /* send column block i of "matrix" to process i */
>> MPI_Scatterv (matrix, sr_counts, sr_disps, column_t,
>> col_block, num_columns[group_w_mytid],
>> col_block_t, 0, COMM_WORKER);
>> /* Modify column elements. The compiler doesn't know the structure
>> * of the column block matrix so that you have to do the index
>> * calculations for mat[i][j] yourself. In C a matrix is stored
>> * row-by-row so that the i-th row starts at location "i * q" if
>> * the matrix has "q" columns. Therefore the address of mat[i][j]
>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j]
>> * itself as "*((double *) mat + i * q + j)".
>> */
>> for (i = 0; i < P; ++i)
>> {
>> for (j = 0; j < num_columns[group_w_mytid]; ++j)
>> {
>> if ((group_w_mytid % 2) == 0)
>> {
>> /* col_block[i][j] *= col_block[i][j] */
>>
>> *((double *) col_block + i * num_columns[group_w_mytid] + j) *=
>> *((double *) col_block + i * num_columns[group_w_mytid] + j);
>> }
>> else
>> {
>> /* col_block[i][j] *= FACTOR */
>>
>> *((double *) col_block + i * num_columns[group_w_mytid] + j) *=
>> FACTOR;
>> }
>> }
>> }
>> /* receive column-block i of "matrix" from process i */
>> MPI_Gatherv (col_block, num_columns[group_w_mytid], col_block_t,
>> matrix, sr_counts, sr_disps, column_t,
>> 0, COMM_WORKER);
>> if (group_w_mytid == 0)
>> {
>> printf ("\n\nresult matrix:\n"
>> " elements are sqared in columns:\n ");
>> tmp = 0;
>> tmp1 = 0;
>> for (i = 0; i < group_w_ntasks; ++i)
>> {
>> tmp1 = tmp1 + num_columns[i];
>> if ((i % 2) == 0)
>> {
>> for (j = tmp; j < tmp1; ++j)
>> {
>> printf ("%4d", j);
>> }
>> }
>> tmp = tmp1;
>> }
>> printf ("\n elements are multiplied with %d in columns:\n ",
>> FACTOR);
>> tmp = 0;
>> tmp1 = 0;
>> for (i = 0; i < group_w_ntasks; ++i)
>> {
>> tmp1 = tmp1 + num_columns[i];
>> if ((i % 2) != 0)
>> {
>> for (j = tmp; j < tmp1; ++j)
>> {
>> printf ("%4d", j);
>> }
>> }
>> tmp = tmp1;
>> }
>> printf ("\n\n\n");
>> print_matrix (P, Q, (double **) matrix);
>> free (sr_counts);
>> free (sr_disps);
>> }
>> free (num_columns);
>> free (col_block);
>> MPI_Type_free (&column_t);
>> MPI_Type_free (&col_block_t);
>> MPI_Comm_free (&COMM_WORKER);
>> }
>>
>>
>> /* =========================================================
>> * ====== This is the group "group_other". ======
>> * =========================================================
>> */
>> MPI_Group_rank (group_other, &group_o_mytid);
>> if (group_o_mytid != MPI_UNDEFINED)
>> {
>> /* Nothing to do (only to demonstrate how to divide work for
>> * different groups).
>> */
>> MPI_Comm_size (COMM_OTHER, &group_o_ntasks);
>> if (group_o_mytid == 0)
>> {
>> if (group_o_ntasks == 1)
>> {
>> printf ("\nGroup \"group_other\" contains %d process "
>> "which has\n"
>> "nothing to do.\n\n", group_o_ntasks);
>> }
>> else
>> {
>> printf ("\nGroup \"group_other\" contains %d processes "
>> "which have\n"
>> "nothing to do.\n\n", group_o_ntasks);
>> }
>> }
>> MPI_Comm_free (&COMM_OTHER);
>> }
>>
>>
>> /* =========================================================
>> * ====== all groups will reach this point ======
>> * =========================================================
>> */
>> MPI_Group_free (&group_worker);
>> MPI_Group_free (&group_other);
>> MPI_Finalize ();
>> return EXIT_SUCCESS;
>> }
>>
>>
>> /* Print the values of an arbitrary 2D-matrix of "double" values. The
>> * compiler doesn't know the structure of the matrix so that you have
>> * to do the index calculations for mat[i][j] yourself. In C a matrix
>> * is stored row-by-row so that the i-th row starts at location "i * q"
>> * if the matrix has "q" columns. Therefore the address of mat[i][j]
>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j]
>> * itself as "*((double *) mat + i * q + j)".
>> *
>> * input parameters: p number of rows
>> * q number of columns
>> * mat 2D-matrix of "double" values
>> * output parameters: none
>> * return value: none
>> * side effects: none
>> *
>> */
>> void print_matrix (int p, int q, double **mat)
>> {
>> int i, j; /* loop variables */
>>
>> for (i = 0; i < p; ++i)
>> {
>> for (j = 0; j < q; ++j)
>> {
>> printf ("%6g", *((double *) mat + i * q + j));
>> }
>> printf ("\n");
>> }
>> printf ("\n");
>> }
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users