I did get MrBayes to run with Xgrid compiled with OpenMPI. However it was setup as more of a "traditional" cluster. The agents all have a shared NFS directory to the controller. Basically I'm only using Xgrid as a job scheduler. It doesn't seem as if MrBayes is a "grid" application but more of an application for a traidional cluster. 

You will need to have the following enabled:

1) NFS shared directory across all the machines on the grid.

2) Open-MPI installed locally on all the machines or via NFS. (You'll need to compile Open MPI)

3) Here's the part that may make Xgrid not desirable to use for MPI applications: 

a) Compile with MPI support:

MPI = yes
CC= $(MPIPATH)/bin/mpicc
CFLAGS = -fast

b) Make sure that Xgrid is set to properly use password-based authentication.

c) Set the environment variables for Open-MPI to use Xgrid as the laucher/scheduler. Assuming bash:

$ export XGRID_CONTROLLER_HOSTNAME=mycomputer.apple.com
$ export XGRID_CONTROLLER_PASSWORD=passwd


You could also add the above to a .bashrc file and have your .bash_profile source it.

d) Run the MPI application:


$ mpirun -np X ./myapp

There are a couple of issues:

It turns out that the directory and files that MrBayes creates must be readable and writable by all the agents. MrBayes requires more than just reading standard input/output but also the creation and writing of other intermediate files. For an application like HP Linpack that just reads and writes one file, things work fine. However, the MrBayes application writes out and reads back two additional files for each MPI process that is spawned.

All the files that MrBayes are trying to read/write must have permissions for user 'nobody'.  This is a  bit of a problem, since you probably (in general) don't want to allow user nobody to write all over your home directory.  One solution (if possible) would be to have the application write into /tmp and then collect the files after the job completes. But I don't know if you can set MrBayes to use a temporary directory. Perhaps your MrBayes customer can let us know how to specify a tmpdir. 

I don't know how or if MrBayes has the option of specifying a temp working directory. I have tested the basics of this by executing an MPI command to copy the *.nex file to /tmp of all the agents. This seems allows everything to work, but I can't seem to easily clean the intermediate files off of the agents after this runs since the MrBayes application created them and the user doesn't own them. 

I'm hoping the OMPI developers can come to the rescue on some of these issues, perhaps working in conjunction with some of the Apple Xgrid engineers.

Lastly, this is from one of the MrBayes folks:

"Getting help with Xgrid among the phylo community will probably be difficult.
Fredrik can't help and probably not anyone with CIPRES either.  Fredrik
recommends mpi since it is unix based and more people use it.

He also does not recommend setting up a cluster in your lab to run MrBayes.
This is because of a fault with MrBayes. The way it is currently set up is that
the runs are only as fast as the slowest machine, in that if someone sits down
to use a machine in the cluster, everything is processed at that speed.
Here we use mpi for in parallel and condor to distribute for non-parallel.

And frankly, MrBayes can be somewhat unstable with mpi and seems to get hung up
on occasion.

Unfortunately for you, I think running large jobs will be a lot easier in a
couple of years."

-Warner


Warner Yuen

Apple Computer

email: wyuen@apple.com

Tel: 408.718.2859

Fax: 408.715.0133



On Apr 14, 2006, at 8:52 AM, users-request@open-mpi.org wrote:

Message: 2

Date: Thu, 13 Apr 2006 14:33:29 -0400 (EDT)

From: liuliang@stat.ohio-state.edu

Subject: Re: [OMPI users] running a job problem

To: "Open MPI Users" <users@open-mpi.org>

Message-ID:

<1122.164.107.248.223.1144953209.squirrel@www.stat.ohio-state.edu>

Content-Type: text/plain;charset=iso-8859-1


Brian,

It worked when I used the latest version of Mrbayes. Thanks. By the way,

do  you have any idea to submit an ompi job on xgrid? Thanks again.

Liang


On Apr 12, 2006, at 9:09 AM, liuliang@stat.ohio-state.edu wrote:


We have a Mac network running xgrid and we have successfully installed

mpi. We want to run a parallell version of mrbayes. It did not have

any

problem when we compiled mrbayes using mpicc. But when we tried to

run the

compiled mrbayes, we got lots errror message


mpiexec -np 4 ./mb -i  yeast_noclock_imp.txt

                              Parallel version of


                              Parallel version of


                              Parallel version of


                              Parallel version of


[ea285fltprinter.scc.ohio-state.edu:03327] *** An error occurred in

MPI_comm_size

[ea285fltprinter.scc.ohio-state.edu:03327] *** on communicator

MPI_COMM_WORLD

[ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERR_COMM: invalid

communicator

[ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERRORS_ARE_FATAL

(goodbye)


This indicates that the application is calling an MPI function with

an invalid communicator.  Unfortunately, this is a hard one to track

down without more information.  What version of mrbayes are you using

and can you share your input deck?


Thanks,


Brian



--

   Brian Barrett

   Open MPI developer

   http://www.open-mpi.org/



_______________________________________________

users mailing list

users@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users