Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] More newbie question: --hostfile option
From: Tena Sakai (tsakai_at_[hidden])
Date: 2011-01-12 22:26:49


Thank you, Gus. I am encouraged. I will look into Torque
in a day or two or three.

Regards,

Tena Sakai
tsakai_at_[hidden]

On 1/12/11 6:49 PM, "Gus Correa" <gus_at_[hidden]> wrote:

> Tena Sakai wrote:
>> Hi,
>>
>> I can execute the command below:
>> $ mpirun -H vixen -np 1 hostname : -H
>> compute-0-0,compute-0-1,compute-0-2 -np 3 hostname
>> and I get:
>> vixen.egcrc.org
>> compute-0-0.local
>> compute-0-1.local
>> compute-0-2.local
>>
>> I have a file myhosts, which looks like:
>> compute-0-0 slots=1
>> compute-0-1 slots=1
>> compute-0-2 slots=1
>> but when I execute:
>> $ mpirun -H vixen -np 1 hostname : --hostfile myhosts -np 3 hostname
>> I get:
>> There are no allocated resources for the application
>> hostname
>> that match the requested mapping:
>>
>> Verify that you have mapped the allocated resources properly using the
>> --host or --hostfile specification.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> There may be more information reported by the environment (see above).
>>
>> This may be because the daemon was unable to find all the needed shared
>> libraries on the remote node. You may set your LD_LIBRARY_PATH to
>> have the
>> location of the shared libraries on the remote nodes and this will
>> automatically be forwarded to the remote nodes.
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> mpirun noticed that the job aborted, but has no info as to the process
>> that caused that situation.
>> --------------------------------------------------------------------------
>> mpirun: clean termination accomplished
>>
>> Interestingly, this works:
>> $ mpirun --hostfile myhosts -np 3 hostname
>> compute-0-0.local
>> compute-0-1.local
>> compute-0-2.local
>> $
>>
>> Am I correct in concluding that ­H and ‹hostfile cannot be issued in the
>> same mpirun command which contains a colon (:)? Or is there any trick
>> or work-around to have both ­H and ‹hostfile?
>>
>> Thank you.
>>
>> Tena Sakai
>> tsakai_at_[hidden]
>>
>
> Hi Tena
>
> I don't know if this is an option for you, but OpenMPI can be built
> integrated with a resource manager.
> This obviates completely the need to specify the host list
> on the mpirun command line, or to use
> a hostfile, or to get involved with all this syntactical nitty-gritty.
> OpenMPI will use exactly those resources (nodes, cores, etc) that are
> made available to it by the resource manager upon your request.
>
> We use Torque here, which is simple, effective, and even available
> through RPM-type packages on many Linux distributions.
> (Although it is also easy to build from source.)
> I think OpenMPI also builds with SGE,
> maybe with other resource managers too.
> See the FAQ and the README file for more details on how to build
> OpenMPI with Torque (or SGE) support.
>
> Resource managers are also a no-nonsense way to manage jobs, either
> from one or from many users.
>
> My two cents,
> Gus Correa
>
> PS - Looking at your node's names, it looks like to me you have a Rocks
> cluster, right?
> Rocks has an SGE and a Torque roll.
> You could install one of them (only one!), if not yet there, and enjoy!
> ('rocks list roll' will tell what you have.)
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users