> If you run your cmd with the hostfile option and add
> --display-allocation, what does it say?
Thank you, Ralph.
This is the command I used inside my submission script:
mpirun --display-allocation -np 4 -hostfile hosts ./program
And this is the output I got.
Data for node: Name: node03 Num slots: 4 Max slots: 0
Data for node: Name: node02 Num slots: 4 Max slots: 0
Data for node: Name: node04 Num slots: 4 Max slots: 0
Data for node: Name: node01 Num slots: 4 Max slots: 0
If I run the same mpirun command on the cluster head node "clhead" then
this is what I get:
Data for node: Name: clhead Num slots: 0 Max slots: 0
Data for node: Name: node01 Num slots: 1 Max slots: 0
Data for node: Name: node02 Num slots: 1 Max slots: 0
Data for node: Name: node03 Num slots: 1 Max slots: 0
Data for node: Name: node04 Num slots: 1 Max slots: 0
The content of the 'hosts' file:
node01 slots=1
node02 slots=1
node03 slots=1
node04 slots=1
= Serge
On Apr 6, 2010, at 12:18 PM, Serge wrote:
> Hi,
>
> OpenMPI integrates with Sun Grid Engine really well, and one does not
> need to specify any parameters for the mpirun command to launch the
> processes on the compute nodes, that is having in the submission script
> "mpirun ./program" is enough; there is no need for "-np XX" or
> "-hostfile file_name".
>
> However, there are cases when being able to specify the hostfile is
> important (hybrid jobs, users with MPICH jobs, etc.). For example, with
> Grid Engine I can request four 4-core nodes, that is total of 16 slots.
> But I also want to specify how to distribute processes on the nodes, so
> I create the file 'hosts'
>
> node01 slots=1
> node02 slots=1
> node03 slots=1
> node04 slots=1
>
> and modify the line in the submission script to:
> mpirun -hostfile hosts ./program
>
> With Open MPI 1.2.x everything worked properly, meaning that Open MPI
> could count the number of slots specified in the 'hosts' file - 4 (i.e.
> effectively supplying the mpirun command with the -np parameter), as
> well as properly distribute processes on the compute nodes (one process
> per host).
>
> It's different with Open MPI 1.4.1. It cannot process the 'hosts' file
> properly at all. All the processes get launched on just one node -- the
> shepherd host.
>
> The format of the 'hosts' file does not matter. It can be, say
>
> node01
> node01
> node02
> node02
>
> meaning 2 slots on each node. Open MPI 1.2.x would handle that with no
> problem, however Open MPI 1.4.x would not.
>
> The problem appears with OMPI 1.4.1, SGE 6.1u6. It was also tested with
> OMPI 1.3.4 and SGE 6.2u4.
>
> It's important to notice that if the mpirun command is run
> interactively, not from inside the Grid Engine script, then it
> interprets the content of the host file just fine.
>
> I am wondering what changed from OMPI 1.2.x to OMPI 1.4.x that prevents
> expected behavior, and is it possible to get it from OMPI 1.4.x by, say,
> tuning some parameters?
>
> = Serge
>
|