Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] More newbie question: --hostfile option
From: Gus Correa (gus_at_[hidden])
Date: 2011-01-12 21:49:02

Tena Sakai wrote:
> Hi,
> I can execute the command below:
> $ mpirun -H vixen -np 1 hostname : -H
> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname
> and I get:
> compute-0-0.local
> compute-0-1.local
> compute-0-2.local
> I have a file myhosts, which looks like:
> compute-0-0 slots=1
> compute-0-1 slots=1
> compute-0-2 slots=1
> but when I execute:
> $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
> I get:
> There are no allocated resources for the application
> hostname
> that match the requested mapping:
> Verify that you have mapped the allocated resources properly using the
> --host or --hostfile specification.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
> launch so we are aborting.
> There may be more information reported by the environment (see above).
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to
> have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --------------------------------------------------------------------------
> mpirun: clean termination accomplished
> Interestingly, this works:
> $ mpirun --hostfile myhosts -np 3 hostname
> compute-0-0.local
> compute-0-1.local
> compute-0-2.local
> $
> Am I correct in concluding that –H and —hostfile cannot be issued in the
> same mpirun command which contains a colon (:)? Or is there any trick
> or work-around to have both –H and —hostfile?
> Thank you.
> Tena Sakai
> tsakai_at_[hidden]

Hi Tena

I don't know if this is an option for you, but OpenMPI can be built
integrated with a resource manager.
This obviates completely the need to specify the host list
on the mpirun command line, or to use
a hostfile, or to get involved with all this syntactical nitty-gritty.
OpenMPI will use exactly those resources (nodes, cores, etc) that are
made available to it by the resource manager upon your request.

We use Torque here, which is simple, effective, and even available
through RPM-type packages on many Linux distributions.
(Although it is also easy to build from source.)
I think OpenMPI also builds with SGE,
maybe with other resource managers too.
See the FAQ and the README file for more details on how to build
OpenMPI with Torque (or SGE) support.

Resource managers are also a no-nonsense way to manage jobs, either
from one or from many users.

My two cents,
Gus Correa

PS - Looking at your node's names, it looks like to me you have a Rocks
cluster, right?
Rocks has an SGE and a Torque roll.
You could install one of them (only one!), if not yet there, and enjoy!
('rocks list roll' will tell what you have.)