Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] problem running Open MPI on Cells
From: Hahn Kim (hgk_at_[hidden])
Date: 2008-10-31 15:38:26


Hello,

I'm having problems using Open MPI on a cluster of Mercury Computer's
Cell Accelerator Boards (CABs).

We have an MPI application that is running on multiple CABs. The
application uses Mercury's MultiCore Framework (MCF) to use the Cell's
SPEs. Here's the basic problem. I can log into each CAB and run the
application in serial directly from the command line (i.e. without
using mpirun) without a problem. I can also launch a serial job onto
each CAB from another machine using mpirun without a problem.

The problem occurs when I try to launch onto multiple CABs using
mpirun. MCF requires a license file. After the application
initializes MPI, it tries to initialized MCF on each node. The
initialization routine loads the MCF license file and checks for valid
license keys. If the keys are valid, then it continues to initialize
MCF. If not, it throws an error.

When I run on multiple CABs, most of the time several of the CABs
throw an error saying MCF cannot find a valid license key. The
strange this is that this behavior doesn't appear when I launch serial
jobs using MCF, only multiple CABs. Additionally, the errors are
inconsistent. Not all the CABs throw an error, sometimes a few of
them error out, sometimes all of them, sometimes none.

I've talked with the Mercury folks and they're just as stumped as I
am. The only thing we can think of is that OpenMPI is somehow
modifying the environment and is interfering with MCF, but we can't
think of any reason why.

Any ideas out there? Thanks.

Hahn

--
Hahn Kim, hgk_at_[hidden]
MIT Lincoln Laboratory
244 Wood St., Lexington, MA 02420
Tel: 781-981-0940, Fax: 781-981-5255