Where did you put the environment variable related to MCF licence file and MCF share libraries?
What is your default shell?

Did you test indicate the following?
Suppose you have 4 nodes,
on node 1, " mpirun -np 4 --host node1,node2,node3,node4 hostname" works,
but "mpirun -np4 --host node1,node2,node3,node4 foocbe" does not work, where foocbe is executable generated with MCF.

It is possible that MCF license is limited to a few concurrent use? e.g. the license is limited to 4 current use, and mpi application will fails on 8 nodes?

Regards,
Mi
Inactive hide details for Hahn Kim <hgk@ll.mit.edu>Hahn Kim <hgk@ll.mit.edu>


          Hahn Kim <hgk@ll.mit.edu>
          Sent by: users-bounces@open-mpi.org

          10/31/2008 03:38 PM
          Please respond to
          Open MPI Users <users@open-mpi.org>


To

Open MPI Users <users@open-mpi.org>

cc


Subject

[OMPI users] problem running Open MPI on Cells

Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's  
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs.  The  
application uses Mercury's MultiCore Framework (MCF) to use the Cell's  
SPEs.  Here's the basic problem.  I can log into each CAB and run the  
application in serial directly from the command line (i.e. without  
using mpirun) without a problem.  I can also launch a serial job onto  
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using  
mpirun.  MCF requires a license file.  After the application  
initializes MPI, it tries to initialized MCF on each node.  The  
initialization routine loads the MCF license file and checks for valid  
license keys.  If the keys are valid, then it continues to initialize  
MCF.  If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs  
throw an error saying MCF cannot find a valid license key.  The  
strange this is that this behavior doesn't appear when I launch serial  
jobs using MCF, only multiple CABs.  Additionally, the errors are  
inconsistent.  Not all the CABs throw an error, sometimes a few of  
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I  
am.  The only thing we can think of is that OpenMPI is somehow  
modifying the environment and is interfering with MCF, but we can't  
think of any reason why.

Any ideas out there?  Thanks.

Hahn

--
Hahn Kim, hgk@ll.mit.edu
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255






_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users