Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Reuti (reuti_at_[hidden])
Date: 2012-02-09 16:19:00


Am 08.02.2012 um 22:52 schrieb Tom Bryan:

> <snip>
> Yes, this should work across multiple machines. And it's using `qrsh
>>>> -inherit
>>>> ...` so it's failing somewhere in Open MPI - is it working with 1.4.4?
>>>
>>> I'm not sure. We no longer have our 1.4 test environment, so I'm in the
>>> process of building that now. I'll let you know once I have a chance to run
>>> that experiment.
>
> You said that both of these cases worked for you in 1.4. Were you running a
> modified version that did not use THREAD_MULTIPLE? I ask because I'm
> getting worse errors in 1.4. I'm using the same code that was working (in
> some cases) with 1.5.4.
>
> I built 1.4.4 with (among other option)
> --with-threads=posix --enable-mpi-threads

./configure --prefix=$HOME/local/openmpi-1.4.4-default-thread --with-sge --with-threads=posix --enable-mpi-threads

No problems even with THREAD_MULTIPLE.

Only as stated in singleton mode one or more additional line (looks like one per slave host, but not always - race condition?):

[pc15370:31390] [[24201,0],1] routed:binomial: Connection to lifeline [[24201,0],0] lost

> <snip>
> ompi_mpi_init: orte_init failed
> --> Returned "Data unpack would read past end of buffer" (-26) instead of
> "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init_thread() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.

Interesting error message, as it's not true to be disallowed.

-- Reuti