Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Borenstein, Bernard S (bernard.s.borenstein_at_[hidden])
Date: 2005-10-06 10:57:44


I built the Nasa Overflow 1.8ab code yesterday with openmpi-1.0a1r7632.
It runs fine with 4 or 8 opteron processors on a myrinet linux cluster.
But if I increase the number of processors to 20, I get errors like this
:

[e053:01260] *** An error occurred in MPI_Free_mem
[e030:15585] *** An error occurred in MPI_Free_mem
[e013:27621] *** An error occurred in MPI_Free_mem
[e030:15585] *** on communicator MPI_COMM_WORLD
[e032:14179] *** An error occurred in MPI_Free_mem
[e053:01260] *** on communicator MPI_COMM_WORLD
[e030:15585] *** MPI_ERR_NO_MEM: out of memory
[e053:01260] *** MPI_ERR_NO_MEM: out of memory
[e013:27621] *** on communicator MPI_COMM_WORLD
[e030:15585] *** MPI_ERRORS_ARE_FATAL (goodbye)
[e032:14179] *** on communicator MPI_COMM_WORLD
[e053:01260] *** MPI_ERRORS_ARE_FATAL (goodbye)
[e013:27621] *** MPI_ERR_NO_MEM: out of memory
[e012:30846] *** An error occurred in MPI_Free_mem
[e012:30846] *** on communicator MPI_COMM_WORLD
[e012:30846] *** MPI_ERR_NO_MEM: out of memory
[e012:30846] *** MPI_ERRORS_ARE_FATAL (goodbye)
[e032:14179] *** MPI_ERR_NO_MEM: out of memory
[e013:27621] *** MPI_ERRORS_ARE_FATAL (goodbye)
[e032:14179] *** MPI_ERRORS_ARE_FATAL (goodbye)
[e032:14178] *** An error occurred in MPI_Free_mem
[e032:14178] *** on communicator MPI_COMM_WORLD
[e032:14178] *** MPI_ERR_NO_MEM: out of memory
[e032:14178] *** MPI_ERRORS_ARE_FATAL (goodbye)
 DIMENSIONS FOR COARSE LEVEL(S), GRID 1:
[e011:12272] spawn: in job_state_callback(jobid = 1, state = 0xa)
[e011:12272] spawn: in job_state_callback(jobid = 1, state = 0x9)
20 processes killed (possibly by Open MPI)
[e011:12272] sess_dir_finalize: found proc session dir empty - deleting
[e011:12272] sess_dir_finalize: job session dir not empty - leaving

I am running using PBSPro and the Intel 9 compiler. Any ideas on what I
could be doing wrong?? The size of my test problem is very small.

Thanx,

Bernie Borenstein
The Boeing Company