Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Warner Yuen (wyuen_at_[hidden])
Date: 2006-04-14 12:51:37


I did get MrBayes to run with Xgrid compiled with OpenMPI. However it
was setup as more of a "traditional" cluster. The agents all have a
shared NFS directory to the controller. Basically I'm only using
Xgrid as a job scheduler. It doesn't seem as if MrBayes is a "grid"
application but more of an application for a traidional cluster.

You will need to have the following enabled:

1) NFS shared directory across all the machines on the grid.

2) Open-MPI installed locally on all the machines or via NFS. (You'll
need to compile Open MPI)

3) Here's the part that may make Xgrid not desirable to use for MPI
applications:

        a) Compile with MPI support:

MPI = yes
CC= $(MPIPATH)/bin/mpicc
CFLAGS = -fast

        b) Make sure that Xgrid is set to properly use password-based
authentication.

        c) Set the environment variables for Open-MPI to use Xgrid as the
laucher/scheduler. Assuming bash:

        $ export XGRID_CONTROLLER_HOSTNAME=mycomputer.apple.com
        $ export XGRID_CONTROLLER_PASSWORD=passwd
        
You could also add the above to a .bashrc file and have
your .bash_profile source it.

        d) Run the MPI application:
        
        $ mpirun -np X ./myapp

There are a couple of issues:

It turns out that the directory and files that MrBayes creates must
be readable and writable by all the agents. MrBayes requires more
than just reading standard input/output but also the creation and
writing of other intermediate files. For an application like HP
Linpack that just reads and writes one file, things work fine.
However, the MrBayes application writes out and reads back two
additional files for each MPI process that is spawned.

All the files that MrBayes are trying to read/write must have
permissions for user 'nobody'. This is a bit of a problem, since
you probably (in general) don't want to allow user nobody to write
all over your home directory. One solution (if possible) would be to
have the application write into /tmp and then collect the files after
the job completes. But I don't know if you can set MrBayes to use a
temporary directory. Perhaps your MrBayes customer can let us know
how to specify a tmpdir.

I don't know how or if MrBayes has the option of specifying a temp
working directory. I have tested the basics of this by executing an
MPI command to copy the *.nex file to /tmp of all the agents. This
seems allows everything to work, but I can't seem to easily clean the
intermediate files off of the agents after this runs since the
MrBayes application created them and the user doesn't own them.

I'm hoping the OMPI developers can come to the rescue on some of
these issues, perhaps working in conjunction with some of the Apple
Xgrid engineers.

Lastly, this is from one of the MrBayes folks:

"Getting help with Xgrid among the phylo community will probably be
difficult.
Fredrik can't help and probably not anyone with CIPRES either. Fredrik
recommends mpi since it is unix based and more people use it.

He also does not recommend setting up a cluster in your lab to run
MrBayes.
This is because of a fault with MrBayes. The way it is currently set
up is that
the runs are only as fast as the slowest machine, in that if someone
sits down
to use a machine in the cluster, everything is processed at that speed.
Here we use mpi for in parallel and condor to distribute for non-
parallel.

And frankly, MrBayes can be somewhat unstable with mpi and seems to
get hung up
on occasion.

Unfortunately for you, I think running large jobs will be a lot
easier in a
couple of years."

-Warner

Warner Yuen
Apple Computer
email: wyuen_at_[hidden]
Tel: 408.718.2859
Fax: 408.715.0133

On Apr 14, 2006, at 8:52 AM, users-request_at_[hidden] wrote:

> Message: 2
> Date: Thu, 13 Apr 2006 14:33:29 -0400 (EDT)
> From: liuliang_at_[hidden]
> Subject: Re: [OMPI users] running a job problem
> To: "Open MPI Users" <users_at_[hidden]>
> Message-ID:
> <1122.164.107.248.223.1144953209.squirrel_at_[hidden]>
> Content-Type: text/plain;charset=iso-8859-1
>
> Brian,
> It worked when I used the latest version of Mrbayes. Thanks. By the
> way,
> do you have any idea to submit an ompi job on xgrid? Thanks again.
> Liang
>
>> On Apr 12, 2006, at 9:09 AM, liuliang_at_[hidden] wrote:
>>
>>> We have a Mac network running xgrid and we have successfully
>>> installed
>>> mpi. We want to run a parallell version of mrbayes. It did not have
>>> any
>>> problem when we compiled mrbayes using mpicc. But when we tried to
>>> run the
>>> compiled mrbayes, we got lots errror message
>>>
>>> mpiexec -np 4 ./mb -i yeast_noclock_imp.txt
>>> Parallel version of
>>>
>>> Parallel version of
>>>
>>> Parallel version of
>>>
>>> Parallel version of
>>>
>>> [ea285fltprinter.scc.ohio-state.edu:03327] *** An error occurred in
>>> MPI_comm_size
>>> [ea285fltprinter.scc.ohio-state.edu:03327] *** on communicator
>>> MPI_COMM_WORLD
>>> [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERR_COMM: invalid
>>> communicator
>>> [ea285fltprinter.scc.ohio-state.edu:03327] *** MPI_ERRORS_ARE_FATAL
>>> (goodbye)
>>
>> This indicates that the application is calling an MPI function with
>> an invalid communicator. Unfortunately, this is a hard one to track
>> down without more information. What version of mrbayes are you using
>> and can you share your input deck?
>>
>> Thanks,
>>
>> Brian
>>
>>
>> --
>> Brian Barrett
>> Open MPI developer
>> http://www.open-mpi.org/
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>