Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Karsten Bolding (karsten_at_[hidden])
Date: 2007-10-31 12:00:28


Hello

I've just introduced the possibility to use OpenMPI instead of MPICH in
an ocean model. The code is quite well tested and has being run in
various parallel setups by various groups.

I've compiled the program using mpif90 (instead of ifort). When I run I
get the error - shown at the end of this mail.

As you can see all 13 jobs are started - but then ...

One problem with ocean models using domain decomposition in relation to
load balancing is that the computational burden of the equal sized
domain is not the same (the different domains have different
land-fractions). To overcome this a matlab tool has been developed that
allows for assigning more sub-doamins to one processor/core based on the
sum of water-points in the sub-domains. Attached is a figure showing the
actual setup in this case. The neighbor relation is read from a file
produced by said matlab-tool. Non-existing neighbors are set to -1
- MPI_PROC_NULL in MPICH.

The setup is run on a quad-core machine for testing purposes only.

Any ideas what goes wrong?

==== error ======
kb_at_gate:~/DK/setups/north_sea_fine$ mpirun -np 13
bin/getm_prod_IFORT.96x96
 Process 0 of 13 is alive on gate
[gate:18564] *** An error occurred in MPI_Isend
[gate:18564] *** on communicator MPI_COMM_WORLD
[gate:18564] *** MPI_ERR_RANK: invalid rank
[gate:18564] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 1 of 13 is alive on gate
[gate:18565] *** An error occurred in MPI_Isend
[gate:18565] *** on communicator MPI_COMM_WORLD
[gate:18565] *** MPI_ERR_RANK: invalid rank
[gate:18565] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 2 of 13 is alive on gate
 Process 3 of 13 is alive on gate
[gate:18567] *** An error occurred in MPI_Isend
[gate:18567] *** on communicator MPI_COMM_WORLD
[gate:18567] *** MPI_ERR_RANK: invalid rank
[gate:18567] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 4 of 13 is alive on gate
[gate:18568] *** An error occurred in MPI_Isend
[gate:18568] *** on communicator MPI_COMM_WORLD
[gate:18568] *** MPI_ERR_RANK: invalid rank
[gate:18568] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 5 of 13 is alive on gate
[gate:18569] *** An error occurred in MPI_Isend
[gate:18569] *** on communicator MPI_COMM_WORLD
[gate:18569] *** MPI_ERR_RANK: invalid rank
[gate:18569] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 7 of 13 is alive on gate
[gate:18571] *** An error occurred in MPI_Isend
[gate:18571] *** on communicator MPI_COMM_WORLD
[gate:18571] *** MPI_ERR_RANK: invalid rank
[gate:18571] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 8 of 13 is alive on gate
 Process 9 of 13 is alive on gate
[gate:18573] *** An error occurred in MPI_Isend
[gate:18573] *** on communicator MPI_COMM_WORLD
[gate:18573] *** MPI_ERR_RANK: invalid rank
[gate:18573] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 10 of 13 is alive on gate
[gate:18574] *** An error occurred in MPI_Isend
[gate:18574] *** on communicator MPI_COMM_WORLD
[gate:18574] *** MPI_ERR_RANK: invalid rank
[gate:18574] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 11 of 13 is alive on gate
 Process 12 of 13 is alive on gate
[gate:18576] *** An error occurred in MPI_Isend
[gate:18576] *** on communicator MPI_COMM_WORLD
[gate:18576] *** MPI_ERR_RANK: invalid rank
[gate:18576] *** MPI_ERRORS_ARE_FATAL (goodbye)
[gate:18566] *** An error occurred in MPI_Isend
[gate:18566] *** on communicator MPI_COMM_WORLD
[gate:18566] *** MPI_ERR_RANK: invalid rank
[gate:18566] *** MPI_ERRORS_ARE_FATAL (goodbye)
[gate:18572] *** An error occurred in MPI_Isend
[gate:18572] *** on communicator MPI_COMM_WORLD
[gate:18572] *** MPI_ERR_RANK: invalid rank
[gate:18572] *** MPI_ERRORS_ARE_FATAL (goodbye)
[gate:18575] *** An error occurred in MPI_Isend
[gate:18575] *** on communicator MPI_COMM_WORLD
[gate:18575] *** MPI_ERR_RANK: invalid rank
[gate:18575] *** MPI_ERRORS_ARE_FATAL (goodbye)
 Process 6 of 13 is alive on gate
[gate:18570] *** An error occurred in MPI_Isend
[gate:18570] *** on communicator MPI_COMM_WORLD
[gate:18570] *** MPI_ERR_RANK: invalid rank
[gate:18570] *** MPI_ERRORS_ARE_FATAL (goodbye)
[gate:18561] [0,0,0] ORTE_ERROR_LOG: Timeout in file
base/pls_base_orted_cmds.c at line 275
[gate:18561] [0,0,0] ORTE_ERROR_LOG: Timeout in file pls_rsh_module.c at
line 1166

-- 
----------------------------------------------------------------------
Karsten Bolding                    Bolding & Burchard Hydrodynamics
Strandgyden 25                     Phone: +45 64422058
DK-5466 Asperup                    Fax:   +45 64422068
Denmark                            Email: karsten_at_[hidden]
http://www.findvej.dk/Strandgyden25,5466,11,3
----------------------------------------------------------------------


mask.fine.size0096x0096_offset-0078x-0022_nodes004.distribution_on_nodes.png