Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Xgrid and choosing agents...
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-07-12 07:07:00


On Jul 11, 2009, at 11:13 PM, Klymak Jody wrote:

> Hi Vitorio,
>
> On 11-Jul-09, at 8:40 PM, Luis Vitorio Cargnini wrote:
>
>> did you saw that, maybe, just maybe using:
>> xserve01.local slots=8 max-slots=8
>> xserve02.local slots=8 max-slots=8
>> xserve03.local slots=8 max-slots=8
>> xserve04.local slots=8 max-slots=8
>>
>> it can set the number of process specifically for each node, the
>> "slots" does this setting the configuration of slots per each node,
>> try it with the old conf of Xgrid and also test with your new Xgrid
>> conf.
>
> As per Ralph's message, the xgrid launcher ignores --hostfiles...
> Further, "max_slots=2" is the same as "slots=2 max_slots=2"
> according to the man page.
>
> Xgrid does have a somewhat convoluted, and poorly documented, method
> of directing jobs to specified machines. Its called Scoreboard and
> it allows the scheduler to query each machine with a script that
> gathers info about the machine and compute a "score". Nodes with
> the highest score get the job. However, how one would implement
> that using openMPI is unclear to me. Does openMPI have the
> capability of passing arbitrary arguments to the resource managers?

Assuming that Scoreboard is appropriately licensed (i.e., is not
licensed under GPL, but preferably something like FreeBSD), and that
it has an accessible API, then we can link against it when in that
environment and interact any way we desire - including asking
Scoreboard for its recommended list of nodes.

>
> Thanks, Jody
>
>>
>> Regards.
>> Vitorio.
>>
>>
>> Le 09-07-11 à 18:11, Klymak Jody a écrit :
>>
>>> If anyone else is using xgrid, there is a mechanism to limit the
>>> processes per machine:
>>>
>>> sudo defaults write /Library/Preferences/com.apple.xgrid.agent
>>> MaximumTaskCount 8
>>>
>>> on each of the nodes and then restarting xgrid tells the
>>> controller to only send 8 processes to that node. For now that is
>>> fine solution for my need. I'll try and figure out how to specify
>>> hosts via xgrid and get back to the list...
>>>
>>> Thanks for everyone's help,
>>>
>>> Cheers, Jody
>>>
>>> On 11-Jul-09, at 12:42 PM, Ralph Castain wrote:
>>>
>>>> Looking at the code, you are correct in that the Xgrid launcher
>>>> is ignoring hostfiles. I'll have to look at it to determine how
>>>> to correct that situation - I didn't write that code, nor do I
>>>> have a way to test any changes I might make to it.
>>>>
>>>> For now, though, if you add --bynode to your command line, you
>>>> should get the layout you want. I'm not sure you'll get the rank
>>>> layout you'll want, though...or if that is important to what you
>>>> are doing.
>>>>
>>>> Ralph
>>>>
>>>> On Jul 11, 2009, at 1:18 PM, Klymak Jody wrote:
>>>>
>>>>> Hi Vitorio,
>>>>>
>>>>> Thanks for getting back to me! My hostfile is
>>>>>
>>>>> xserve01.local max-slots=8
>>>>> xserve02.local max-slots=8
>>>>> xserve03.local max-slots=8
>>>>> xserve04.local max-slots=8
>>>>>
>>>>> I've now checked, and this seems to work fine just using ssh.
>>>>> i.e. if I turn off the Xgrid queue manager I can submit jobs
>>>>> manually to the appropriate nodes using --hosts.
>>>>>
>>>>> However, I'd really like to use Xgrid as my queue manager as it
>>>>> is already set up (though I'll happily take hints on how to set
>>>>> up other queue managers on an OS X cluster).
>>>>>
>>>>>> So you have 4 nodes each one with 2 processors, each processor
>>>>>> 4-core - quad-core.
>>>>>> So you have capacity for 32 process in parallel.
>>>>>
>>>>> The new Xeon chips designate 2-processes per core, though at a
>>>>> reduced clock rate. This means that Xgrid believes I have 16
>>>>> processors/node. For large jobs I expect that to be useful, but
>>>>> for my more modest jobs I really only want 8 processes/node.
>>>>>
>>>>> It appears that the default way xgrid assigns the jobs is to
>>>>> fill all 16 slots on one node before moving to the next.
>>>>> OpenMPI doesn't appear to look at the hostfile configuration
>>>>> when using Xgrid, so it makes it hard for me to deprecate this
>>>>> behaviour.
>>>>>
>>>>> Thanks, Jody
>>>>>
>>>>>
>>>>>
>>>>>> I think that only using the hostfile is enough is how I use. If
>>>>>> you to specify a specific host or a different sequence, the
>>>>>> mpirun will obey the host sequence in your hostfile to start
>>>>>> the process, also can you put how you configured your host
>>>>>> files ? I'm asking this because you should have something like:
>>>>>> # This is an example hostfile. Comments begin with
>>>>>> # #
>>>>>> # The following node is a single processor machine:
>>>>>> foo.example.com
>>>>>> # The following node is a dual-processor machine:
>>>>>> bar.example.com slots=2
>>>>>> # The following node is a quad-processor machine, and we
>>>>>> absolutely
>>>>>> # want to disallow over-subscribing it:
>>>>>> yow.example.com slots=4 max-slots=4
>>>>>> so in your case like mine you should have something like:
>>>>>> your.hostname.domain slots=8 max-slots=8 # for each node
>>>>>>
>>>>>> I hope this will help you.
>>>>>> Regards.
>>>>>> Vitorio.
>>>>>>
>>>>>>
>>>>>> Le 09-07-11 à 10:56, Klymak Jody a écrit :
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Sorry in advance if these are naive questions - I'm not
>>>>>>> experienced in running a grid...
>>>>>>>
>>>>>>> I'm using openMPI on 4 duo Quad-core Xeon xserves. The 8
>>>>>>> cores mimic 16 cores and show up in xgrid as each agent having
>>>>>>> 16 processors. However, the processing speed goes down as the
>>>>>>> used processors exceeds 8, so if possible I'd prefer to not
>>>>>>> have more than 8 processors working on each machine at a time.
>>>>>>>
>>>>>>> Unfortunately, if I submit a 16-processor job to xgrid it all
>>>>>>> goes to "xserve03". Or even worse, it does so if I submit two
>>>>>>> separate 8-processor jobs. Is there anyway to steer jobs to
>>>>>>> less-busy agents?
>>>>>>>
>>>>>>> I tried making a hostfile and then specifying the host, but I
>>>>>>> get:
>>>>>>>
>>>>>>> /usr/local/openmpi/bin/mpirun -n 8 --hostfile hostfile --host
>>>>>>> xserve01.local ../build/mitgcmuv
>>>>>>>
>>>>>>> Some of the requested hosts are not included in the current
>>>>>>> allocation for the
>>>>>>> application:
>>>>>>> ../build/mitgcmuv
>>>>>>> The requested hosts were:
>>>>>>> xserve01.local
>>>>>>>
>>>>>>> so I assume --host doesn't work with xgrid?
>>>>>>>
>>>>>>> Is a reasonable alternative to simply not use xgrid and rely
>>>>>>> on ssh?
>>>>>>>
>>>>>>> Thanks, Jody
>>>>>>>
>>>>>>> --
>>>>>>> Jody Klymak
>>>>>>> http://web.uvic.ca/~jklymak
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> users mailing list
>>>>>>> users_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users