Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Xgrid an openmpi 1.2 and 1.5rc1
From: charlie strauss (cems_at_[hidden])
Date: 2010-06-21 15:28:29

Perhaps I was mistaken about 1.5rc1. As for the installed openMPI
on mac osx, my 10.5 OSX has v1.2.3 when I try to run it, it works
fine locally but it never finds the xgrid.

any mpi job I run, will run on the localhost not the xgrid agents. If
try to force the issue by specifying -nolocal then it just complains
there are no nodes.

SO how do I use openMPI so that it uses the nodes of an xgrid cluster?

mpirun -nolocal -n 32 /bin/hostname
There are no available nodes allocated to this job. This could be
no nodes were found or all the available nodes were already used.

Note that since the -nolocal option was given no processes can be
launched on the local node.
[] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
base/rmaps_base_support_fns.c at line 168
[] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
round_robin/rmaps_rr.c at line 402
[] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmaps/
base/rmaps_base_map_job.c at line 210
[] [0,0,0] ORTE_ERROR_LOG: Temporarily out of
resource in file /SourceCache/openmpi/openmpi-5/openmpi/orte/mca/rmgr/
urm/rmgr_urm.c at line 372

On Jun 16, 2010, at 1:36 PM, Ralph Castain wrote:

> Where did you see that 1.5 works with xgrid? That support has been
> broken since the 1.2 series, unfortunately, so it would help to
> ensure we don't have stale docs out there to the contrary.
> As for the 1.2 results, you are aware (I imagine) that OSX ships
> with the last 1.2 release already installed? You don't have to do
> anything to use it but run.
> If you are getting peer timeouts, that is almost always a firewall
> issue. But I would try the factory-installed version first to be sure.
> On Jun 16, 2010, at 1:14 PM, Charlie E. Strauss wrote:
>> I'm new to openMPI. I'm trying to set it up for using xgrid. I
>> have read
>> that v1.3 and v1.4 are broken on OSX 10.5 and 10.6 although I have
>> seen
>> some discussions in the archives of this mail list saying some
>> people have
>> v1.4 running on 10.6.
>> I have now compiled both openMPI 1.2 and openMPI1.5rc and neither of
>> these is working for me with xgrid. Both of these say they work
>> with
>> xgrid.
>> The failuremodes are different.
>> Anyone know how to get a working install? I am building this on a
>> OSX 10.5.8
>> machine. THe xgrid controller is on a OSX 10.6 server machine. I
>> have tried
>> configuring with and without the --with-xgrid option.
>> Behaviour of openMPI1.2
>> $ /usr/local/openmpi/bin/mpirun -nolocal -n 2 /bin/hostname
>> THe job appears in the xgrid queue, and the logs show it is running
>> on a
>> remote machine. However nothing ever happens and peeking in the
>> xgrid
>> results I see:
>> $ xgrid -job results -id 8703
>> [] [0,0,1]-[0,0,0]
>> mca_oob_tcp_peer_complete_connect:
>> connection failed: Operation timed out (60) - retrying
>> [] [0,0,2]-[0,0,0]
>> mca_oob_tcp_peer_complete_connect:
>> connection failed: Operation timed out (60) - retrying
>> Perhaps a firewall issue?
>> Of course I'm more interested in getting the new openMPI1.5 working.
>> When I run this, again I get an entry in the queue, and the job
>> runs on a
>> remote machine but I get a job failed message
>> $ /usr/local/openmpi5/bin/mpirun -n 2 /bin/hostname
>> $ xgrid -job results -id 8702
>> [] Error: unknown option "-mca"
>> ----
>> Note I have NOT installed openMPI on any of the other computers in
>> the
>> grid. So perhaps that is the problem? If I did install it on other
>> computers how would I tell mpirun where to find the path to the
>> install
>> point?
>> ----
>> Finally in both cases, I don't see any way to pass xgrid specific
>> argument
>> in on the mpi command line. An xgrid controller divides the agents
>> into
>> sets of logical grids and you need to specify which logical grid to
>> submit
>> the job to. In xgrid cli syntax one write "xgrid -gid 2" for
>> grid 2.
>> When I use openMPI all the jobs get sent to just the default grid
>> which is
>> the grid that xgrid uses if no gid is specified.
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]

Charlie Strauss
Bioscience Division
505 665 4838
Quidquid latine dictum sit, altum sonatur.