Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Segfaults w/ both 1.4 and 1.5 on CentOS 6.2/SGE
From: Joshua Baker-LePain (jlb17_at_[hidden])
Date: 2012-03-15 00:28:07


On Thu, 15 Mar 2012 at 12:44am, Reuti wrote

> Which version of SGE are you using? The traditional rsh startup was
> replaced by the builtin startup some time ago (although it should still
> work).

We're currently running the rather ancient 6.1u4 (due to the "If it ain't
broke..." philosophy). The hardware for our new queue master recently
arrived and I'll soon be upgrading to the most recent Open Grid Scheduler
release. Are you saying that the upgrade with the new builtin startup
method should avoid this problem?

> Maybe this shows already the problem: there are two `qrsh -inherit`, as
> Open MPI thinks these are different machines (I ran only with one slot
> on each host hence didn't get it first but can reproduce it now). But
> for SGE both may end up in the same queue overriding the openmpi-session
> in $TMPDIR.
>
> Although it's running: you get all output? If I request 4 slots and get
> one from each queue on both machines the mpihello outputs only 3 lines:
> the "Hello World from Node 3" is always missing.

I do seem to get all the output -- there are indeed 64 Hello World lines.

Thanks again for all the help on this. This is one of the most productive
exchanges I've had on a mailing list in far too long.

-- 
Joshua Baker-LePain
QB3 Shared Cluster Sysadmin
UCSF