Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI 1.4.x ignores hostfile and launches all the processes on just one node in Grid Engine
From: Serge (skhan_at_[hidden])
Date: 2010-04-07 07:36:44


> If you run your cmd with the hostfile option and add
> --display-allocation, what does it say?

Thank you, Ralph.

This is the command I used inside my submission script:

   mpirun --display-allocation -np 4 -hostfile hosts ./program

And this is the output I got.

  Data for node: Name: node03 Num slots: 4 Max slots: 0
  Data for node: Name: node02 Num slots: 4 Max slots: 0
  Data for node: Name: node04 Num slots: 4 Max slots: 0
  Data for node: Name: node01 Num slots: 4 Max slots: 0

If I run the same mpirun command on the cluster head node "clhead" then
this is what I get:

  Data for node: Name: clhead Num slots: 0 Max slots: 0
  Data for node: Name: node01 Num slots: 1 Max slots: 0
  Data for node: Name: node02 Num slots: 1 Max slots: 0
  Data for node: Name: node03 Num slots: 1 Max slots: 0
  Data for node: Name: node04 Num slots: 1 Max slots: 0

The content of the 'hosts' file:

  node01 slots=1
  node02 slots=1
  node03 slots=1
  node04 slots=1

= Serge

On Apr 6, 2010, at 12:18 PM, Serge wrote:

> Hi,
>
> OpenMPI integrates with Sun Grid Engine really well, and one does not
> need to specify any parameters for the mpirun command to launch the
> processes on the compute nodes, that is having in the submission script
> "mpirun ./program" is enough; there is no need for "-np XX" or
> "-hostfile file_name".
>
> However, there are cases when being able to specify the hostfile is
> important (hybrid jobs, users with MPICH jobs, etc.). For example, with
> Grid Engine I can request four 4-core nodes, that is total of 16 slots.
> But I also want to specify how to distribute processes on the nodes, so
> I create the file 'hosts'
>
> node01 slots=1
> node02 slots=1
> node03 slots=1
> node04 slots=1
>
> and modify the line in the submission script to:
> mpirun -hostfile hosts ./program
>
> With Open MPI 1.2.x everything worked properly, meaning that Open MPI
> could count the number of slots specified in the 'hosts' file - 4 (i.e.
> effectively supplying the mpirun command with the -np parameter), as
> well as properly distribute processes on the compute nodes (one process
> per host).
>
> It's different with Open MPI 1.4.1. It cannot process the 'hosts' file
> properly at all. All the processes get launched on just one node -- the
> shepherd host.
>
> The format of the 'hosts' file does not matter. It can be, say
>
> node01
> node01
> node02
> node02
>
> meaning 2 slots on each node. Open MPI 1.2.x would handle that with no
> problem, however Open MPI 1.4.x would not.
>
> The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with
> OMPI 1.3.4 and SGE 6.2u4.
>
> It's important to notice that if the mpirun command is run
> interactively, not from inside the Grid Engine script, then it
> interprets the content of the host file just fine.
>
> I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents
> expected behavior, and is it possible to get it from OMPI 1.4.x by, say,
> tuning some parameters?
>
> = Serge
>