Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
From: Serge (skhan_at_[hidden])
Date: 2010-04-07 10:22:43

>> However, there are cases when being able to specify the hostfile is
>> important (hybrid jobs, users with MPICH jobs, etc.).

>[I don't understand what MPICH has to do with it.]

This was just an example of how the different behavior of OMPI 1.4 may
cause problems. The MPICH library is not the subject of discussion.
MPICH requires the use of hostfile, which is generated by SGE, and
having it in the submission for an Open MPI 1.2.x job has an expected
effect. This is different for Open MPI 1.4.x, which appears not
interpreting the host file properly.

>> For example,
>> with Grid Engine I can request four 4-core nodes, that is total of 16
>> slots. But I also want to specify how to distribute processes on the
>> nodes, so I create the file 'hosts'
>> node01 slots=1
>> node02 slots=1
>> node03 slots=1
>> node04 slots=1
>> and modify the line in the submission script to:
>> mpirun -hostfile hosts ./program

> Regardless of any open-mpi bug, I'd have thought it was easier just to
> use -npernode in that case. What's the problem with that? It seems to
> me best generally to control the distribution of processes with mpirun
> on the SGE-allocated nodes than to concoct host files as we used to do
> here, e.g. to get -byslot v. -bynode behaviour (or vice versa).

This is exactly what I am doing -- controlling distribution of processes
with mpirun on the SGE-allocated nodes, by supplying the hostfile. Grid
Engine allocates nodes and generates a hostfile, which I then can modify
however I want to, before running the mpirun command. Moreover, it gives
more control, by allowing to create specific SGE parallel environments,
where the process distribution is predetermined -- one less worry for
users playing with mpirun options.

The example in my initial email was deliberately simplified to
demonstrate the problem.

= Serge