Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] trunk problem for large-SMP startup?
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-03-04 16:22:14


I'll take a look - offhand, I don't know of anything limiting you to
<= 64 ppn

On Mar 4, 2009, at 1:49 PM, Eugene Loh wrote:

> I have a problem starting large SMP jobs (e.g., 64 processes on a
> single SMP) that might be related to a recent trunk change.
> (Guessing.) Does the following ring any bells?
>
> ...
> ...
> ...
> [burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in
> file ess_env_module.c at line 299
> [burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in
> file base/grpcomm_base_modex.c at line 416
> [burl-t5440-0:06798] [[57827,1],42] ORTE_ERROR_LOG: Not found in
> file grpcomm_bad_module.c at line 378
> [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in
> file ess_env_module.c at line 299
> [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in
> file base/grpcomm_base_modex.c at line 416
> [burl-t5440-0:06800] [[57827,1],44] ORTE_ERROR_LOG: Not found in
> file grpcomm_bad_module.c at line 378
> [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in
> file ess_env_module.c at line 299
> [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in
> file base/grpcomm_base_modex.c at line 416
> [burl-t5440-0:06797] [[57827,1],41] ORTE_ERROR_LOG: Not found in
> file grpcomm_bad_module.c at line 378
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process
> is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> orte_grpcomm_modex failed
> --> Returned "Not found" (-13) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [burl-t5440-0:6756] Abort before MPI_INIT completed successfully;
> not able to guarantee that all other processes were killed!
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> [burl-t5440-0:6757] Abort before MPI_INIT completed successfully;
> not able to guarantee that all other processes were killed!
> ...
> ...
> ...
> <trunk-problem.tar.gz>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel