Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-06-22 06:21:01


On Jun 22, 2007, at 3:52 AM, sadfub_at_[hidden] wrote:

>> 1. You might want to update your version of Open MPI if possible; the
>> v1.1.1 version is quite old. We have added many new bug fixes and
>> features since v1.1.1 (including tight SGE integration). There is
>> nothing special about the Open MPI that is included in the OFED
>> distribution; you can download a new version from the Open MPI web
>> site (the current stable version is v1.2.3), configure, compile, and
>> install it with your current OFED installation. You should be able
>> to configure Open MPI with:
>
> Hmm, I've heard about conflicts with OMPI 1.2.x and OFED 1.1 (sorry no
> refference here),

I'm unaware of any problems with OMPI 1.2.x and OFED 1.1. I run OFED
1.1 on my cluster at Cisco and have many different versions of OMPI
installed (1.2, trunk, etc.).

> and I've got no luck producing a working OMPI
> installation ("mpirun --help" runs, and ./IMB-MPI compiles and runs
> too,
> but "mpirun -np 2 node03,node14 IMB-MPI1" doesnt (segmentation
> fault))...

Can you send more information on this? See http://www.open-mpi.org/
community/help/

> (beside that, I know that OFED 1.1 is quite old too) So I'm
> tested it with OMPI 1.1.5 => same error.

*IF* all goes well, OFED 1.2 should be released today (famous last
words).

>> 2. I know little/nothing about SGE, but I'm assuming that you need to
>> have SGE pass the proper memory lock limits to new processes. In an
>> interactive login, you showed that the max limit is "8162952" -- you
>> might just want to make it unlimited, unless you have a reason for
>> limiting it. See http://www.open-mpi.org/faq/?
>
> yes I allready read the faq, and even setting them to unlimited has
> shown not be working. In the SGE one could specify the limits to
> SGE-jobs by e.g. the qmon tool, (configuring queues > select queue >
> modify > limits) But there is everything set to infinity. (Beside
> that,
> the job is running with a static machinefile (is this an
> "noninteractive" job?)) How could I test ulimits of interactive and
> noninteractive jobs?

Launch an SGE job that calls the shell command "limit" (if you run C-
shell variants) or "ulimit -l" (if you run Bourne shell variants).
Ensure that the output is "unlimited".

What are the limits of the user that launches the SGE daemons? I.e.,
did the SGE daemons get started with proper "unlimited" limits? If
not, that could hamper SGE's ability to set the limits that you told
it to via qmon (remember my disclaimer: I know nothing about SGE, so
this is speculation).

-- 
Jeff Squyres
Cisco Systems