Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Kelley, Sean (Sean.Kelley_at_[hidden])
Date: 2007-07-23 09:04:13


Hi,
 
     We are experiencing a problem with the process allocation on our Open MPI cluster. We are using Scyld 4.1 (BPROC), the OFED 1.2 Topspin Infiniband drivers, Open MPI 1.2.3 + patch (to run processes on the head node). The hardware consists of a head node and N blades on private ethernet and infiniband networks.
 
The command run for these tests is a simple MPI program (called 'hn') which prints out the rank and the hostname. The hostname for the head node is 'head' and the compute nodes are '.0' ... '.9'.
 
We are using the following hostfiles for this example:
 
hostfile7
-1 max_slots=1
0 max_slots=3
1 max_slots=3
 
hostfile8
-1 max_slots=2
0 max_slots=3
1 max_slots=3
 
hostfile9
-1 max_slots=3
0 max_slots=3
1 max_slots=3
 
running the following commands:
 
orterun --hostfile hostfile7 -np 7 ./hn
orterun --hostfile hostfile8 -np 8 ./hn
orterun --byslot --hostfile hostfile7 -np 7 ./hn
orterun --byslot --hostfile hostfile8 -np 8 ./hn
 
causes orterun to crash. However,
 
orterun --hostfile hostfile9 -np 9 ./hn
ortetrun --byslot --hostfile hostfile9 -np 9 ./hn
 
works outputing the following:
 
0 head
1 head
2 head
3 .0
4 .0
5 .0
6 .0
7 .0
8 .0
 
However, running the following:
 
orterun --bynode --hostfile hostfile7 -np 7 ./hn
 
works, outputing the following
 
0 head
1 .0
2 .1
3 .0
4 .1
5 .0
6 .1
 
Is the '--byslot' crash a known problem? Does it have something to do with BPROC? Thanks in advance for any assistance!
 
Sean