Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Xgrid an openmpi 1.2 and 1.5rc1
From: charlie strauss (cems_at_[hidden])
Date: 2010-06-21 17:01:33


To be more specific.

I have a working xgrid with the envirnment variables set. In
particular I can run xgrid commands from the shell prompt like this:

xgrid -job submit /bin/hostname

and it runs because the enviroment variables are set.

my understanding is that openMPI will look for those ENV vars and if
present try to run on xgrid. my understanding is that there are no
configuration files for this needed. It should work out of the box.

thus I could be able to type at the same command line:
mpirun -np 3 /bin/hostname
  or
mpirun -np 3 examples/hello_c ( the mpi example)

and have them run on xgrid. (for example see http://www.macresearch.org/getting_started_with_openmpi_and_xgrid
  )

But that's not what happens instead they always run on the localhost

  I know I'm not the only one who has this issue since i can reproduce
it on 6 different computers around me and I see questions like mine
posted on the web.

Is there any other configuration one needs to use the built-in openmpi
and have it use an available xgrid?

(separate question: if so, does it always uses the default logical
grid or is there a way to configure which grid id (a given
controller_host can partition the grid into logical subsets of nodes.
in xgrid-speak these are calles logical grids and one of these is
assigned to be the default grid if the grif-id is not specified).

On Jun 21, 2010, at 1:40 PM, Barrett, Brian W wrote:

> You have to set two environment variables (XGRID_CONTROLLER_HOSTNAME
> and XGRID_CONTROLLER_PASSWORD) with the correct information in order
> for the XGrid starter to work. Due to the way XGrid works, the
> nolocal option will not work properly when launching with XGrid.
>
> Brian
>
> On Jun 21, 2010, at 1:28 PM, charlie strauss wrote:
>
>> Perhaps I was mistaken about 1.5rc1. As for the installed
>> openMPI on mac osx, my 10.5 OSX has v1.2.3 when I try to run it,
>> it works fine locally but it never finds the xgrid.
>>
>> any mpi job I run, will run on the localhost not the xgrid agents.
>> If try to force the issue by specifying -nolocal then it just
>> complains there are no nodes.
>>
>> SO how do I use openMPI so that it uses the nodes of an xgrid
>> cluster?
>>
>> mpirun -nolocal -n 32 /bin/hostname
>> --------------------------------------------------------------------------
>> There are no available nodes allocated to this job. This could be
>> because
>> no nodes were found or all the available nodes were already used.
>>
>> Note that since the -nolocal option was given no processes can be
>> launched on the local node.
>> --------------------------------------------------------------------------
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
>> resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/
>> rmaps/base/rmaps_base_support_fns.c at line 168
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
>> resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/
>> rmaps/round_robin/rmaps_rr.c at line 402
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
>> resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/
>> rmaps/base/rmaps_base_map_job.c at line 210
>> [ocho.lanl.gov:35438] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
>> resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/
>> rmgr/urm/rmgr_urm.c at line 372
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Jun 16, 2010, at 1:36 PM, Ralph Castain wrote:
>>
>>> Where did you see that 1.5 works with xgrid? That support has been
>>> broken since the 1.2 series, unfortunately, so it would help to
>>> ensure we don't have stale docs out there to the contrary.
>>>
>>> As for the 1.2 results, you are aware (I imagine) that OSX ships
>>> with the last 1.2 release already installed? You don't have to do
>>> anything to use it but run.
>>>
>>> If you are getting peer timeouts, that is almost always a firewall
>>> issue. But I would try the factory-installed version first to be
>>> sure.
>>>
>>> On Jun 16, 2010, at 1:14 PM, Charlie E. Strauss wrote:
>>>
>>>> I'm new to openMPI. I'm trying to set it up for using xgrid. I
>>>> have read
>>>> that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I
>>>> have seen
>>>> some discussions in the archives of this mail list saying some
>>>> people have
>>>> v1.4 running on 10.6.
>>>>
>>>> I have now compiled both openMPI 1.2 and openMPI1.5rc and
>>>> neither of
>>>> these is working for me with xgrid. Both of these say they work
>>>> with
>>>> xgrid.
>>>>
>>>> The failuremodes are different.
>>>>
>>>> Anyone know how to get a working install? I am building this on
>>>> a OSX 10.5.8
>>>> machine. THe xgrid controller is on a OSX 10.6 server machine.
>>>> I have tried
>>>> configuring with and without the --with-xgrid option.
>>>>
>>>> Behaviour of openMPI1.2
>>>> $ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname
>>>>
>>>> THe job appears in the xgrid queue, and the logs show it is
>>>> running on a
>>>> remote machine. However nothing ever happens and peeking in the
>>>> xgrid
>>>> results I see:
>>>>
>>>> $ xgrid -job results -id 8703
>>>> [brio.llnl.gov:38789] [0,0,1]-[0,0,0]
>>>> mca_oob_tcp_peer_complete_connect:
>>>> connection failed: Operation timed out (60) - retrying
>>>> [brio.llnl.gov:38792] [0,0,2]-[0,0,0]
>>>> mca_oob_tcp_peer_complete_connect:
>>>> connection failed: Operation timed out (60) - retrying
>>>>
>>>> Perhaps a firewall issue?
>>>>
>>>> Of course I'm more interested in getting the new openMPI1.5
>>>> working.
>>>> When I run this, again I get an entry in the queue, and the job
>>>> runs on a
>>>> remote machine but I get a job failed message
>>>>
>>>> $ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname
>>>> $ xgrid -job results -id 8702
>>>> [brio.llnl.gov:38776] Error: unknown option "-mca"
>>>>
>>>> ----
>>>>
>>>> Note I have NOT installed openMPI on any of the other computers
>>>> in the
>>>> grid. So perhaps that is the problem? If I did install it on
>>>> other
>>>> computers how would I tell mpirun where to find the path to the
>>>> install
>>>> point?
>>>>
>>>> ----
>>>>
>>>>
>>>> Finally in both cases, I don't see any way to pass xgrid specific
>>>> argument
>>>> in on the mpi command line. An xgrid controller divides the
>>>> agents into
>>>> sets of logical grids and you need to specify which logical grid
>>>> to submit
>>>> the job to. In xgrid cli syntax one write "xgrid -gid 2" for
>>>> grid 2.
>>>> When I use openMPI all the jobs get sent to just the default grid
>>>> which is
>>>> the grid that xgrid uses if no gid is specified.
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> Charlie Strauss
>> Bioscience Division
>> cems_at_[hidden]
>> 505 665 4838
>> Quidquid latine dictum sit, altum sonatur.
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Charlie Strauss
Bioscience Division
cems_at_[hidden]
505 665 4838
Quidquid latine dictum sit, altum sonatur.