Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] trunk problem for large-SMP startup?
From: Eugene Loh (Eugene.Loh_at_[hidden])
Date: 2009-03-04 15:49:43


I have a problem starting large SMP jobs (e.g., 64 processes on a single
SMP) that might be related to a recent trunk change. (Guessing.) Does
the following ring any bells?

...
...
...
[burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file
ess_env_module.c at line 299
[burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file
base/grpcomm_base_modex.c at line 416
[burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in file
grpcomm_bad_module.c at line 378
[burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file
ess_env_module.c at line 299
[burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file
base/grpcomm_base_modex.c at line 416
[burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in file
grpcomm_bad_module.c at line 378
[burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file
ess_env_module.c at line 299
[burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file
base/grpcomm_base_modex.c at line 416
[burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in file
grpcomm_bad_module.c at line 378
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):

  orte_grpcomm_modex failed
  --> Returned "Not found" (-13) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[burl-t5440-0:6756] Abort before MPI_INIT completed successfully; not
able to guarantee that all other processes were killed!
*** An error occurred in MPI_Init
*** before MPI was initialized
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
[burl-t5440-0:6757] Abort before MPI_INIT completed successfully; not
able to guarantee that all other processes were killed!
...
...
...