Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem running Open MPI on Cells
From: Gilbert Grosdidier (grodid_at_[hidden])
Date: 2008-10-31 16:52:38


 To monitor the environment from inside the application, it could be useful to
issue a 'system("printenv")' call at the very beginning of the main program,
even before (and after, btw) the MPI_Init call, when running in serial job mode
with a single CAB, using mpirun.

 HTH, Gilbert.

On Fri, 31 Oct 2008, Hahn Kim wrote:

> Hello,
> I'm having problems using Open MPI on a cluster of Mercury Computer's Cell
> Accelerator Boards (CABs).
> We have an MPI application that is running on multiple CABs. The application
> uses Mercury's MultiCore Framework (MCF) to use the Cell's SPEs. Here's the
> basic problem. I can log into each CAB and run the application in serial
> directly from the command line (i.e. without using mpirun) without a problem.
> I can also launch a serial job onto each CAB from another machine using mpirun
> without a problem.
> The problem occurs when I try to launch onto multiple CABs using mpirun. MCF
> requires a license file. After the application initializes MPI, it tries to
> initialized MCF on each node. The initialization routine loads the MCF
> license file and checks for valid license keys. If the keys are valid, then
> it continues to initialize MCF. If not, it throws an error.
> When I run on multiple CABs, most of the time several of the CABs throw an
> error saying MCF cannot find a valid license key. The strange this is that
> this behavior doesn't appear when I launch serial jobs using MCF, only
> multiple CABs. Additionally, the errors are inconsistent. Not all the CABs
> throw an error, sometimes a few of them error out, sometimes all of them,
> sometimes none.
> I've talked with the Mercury folks and they're just as stumped as I am. The
> only thing we can think of is that OpenMPI is somehow modifying the
> environment and is interfering with MCF, but we can't think of any reason why.
> Any ideas out there? Thanks.
> Hahn
> --
> Hahn Kim, hgk_at_[hidden]
> MIT Lincoln Laboratory
> 244 Wood St., Lexington, MA 02420
> Tel: 781-981-0940, Fax: 781-981-5255
> _______________________________________________
> users mailing list
> users_at_[hidden]

  Gilbert Grosdidier                 Gilbert.Grosdidier_at_[hidden]
  LAL / IN2P3 / CNRS                 Phone : +33 1 6446 8909
  Faculté des Sciences, Bat. 200     Fax   : +33 1 6446 8546
  B.P. 34, F-91898 Orsay Cedex (FRANCE)