Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines
From: Jingcha Joba (pukkimonkey_at_[hidden])
Date: 2012-04-24 18:07:58

Try using slots in hostfile ?

Sent from my iPhone
On Apr 24, 2012, at 2:52 PM, Kyle Boe <boex0040_at_[hidden]> wrote:
> I'm having a problem trying to use OpenMPI on some multicore machines I have. The code I am running was giving me errors which suggested that MPI was assigning multiple processes to the same core (which I do not want). So, I tried launching my job using the -nooversubscribe option, and I get this error:
> bash-3.2$ mpirun -np 2 -nooversubscribe <executable and options here>
> --------------------------------------------------------------------------
> There are not enough slots available in the system to satisfy the 2 slots 
> that were requested by the application:
>   <executable name>
> Either request fewer slots for your application, or make more slots available
> for use.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1  while attempting to
> launch so we are aborting.
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> I am just trying to run on the localhost, not on any remote machines. This happens on both my 8 (2*4) core and 24 (4*6) core machines. Relevant info: I am not using any type of scheduler here, although from the searching I've done that doesn't seem like a requirement. The only thing I can think is there must be some type of configuration or option I'm not setting for using on shared memory machines (either at compile or run time), but I can't find anyone else who has come across this error. Any thoughts?
> Thanks,
> Kyle
> _______________________________________________
> users mailing list
> users_at_[hidden]