Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Using physical numbering in a rankfile
From: teng ma (tma_at_[hidden])
Date: 2012-02-02 12:17:37


Just remove p in your rankfile like

rank 0=host1 slot=0:0
rank 1=host1 slot=0:2
rank 2=host1 slot=0:4
rank 3=host1 slot=0:6
rank 4=host1 slot=1:1
rank 5=host1 slot=1:3
rank 6=host1 slot=1:5
rank 7=host1 slot=1:7

Teng

2012/2/2 François Tessier <francois.tessier_at_[hidden]>

> Hello,
>
> I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic
> architecture. I'm using a node for which lstopo returns that :
>
> ----------------
> Machine (24GB)
> NUMANode L#0 (P#0 12GB)
> Socket L#0 + L3 L#0 (8192KB)
> L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
> L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2)
> L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4)
> L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6)
> HostBridge L#0
> PCIBridge
> PCI 8086:10c9
> Net L#0 "eth0"
> PCI 8086:10c9
> Net L#1 "eth1"
> PCIBridge
> PCI 15b3:673c
> Net L#2 "ib0"
> Net L#3 "ib1"
> OpenFabrics L#4 "mlx4_0"
> PCIBridge
> PCI 102b:0522
> PCI 8086:3a22
> Block L#5 "sda"
> Block L#6 "sdb"
> Block L#7 "sdc"
> Block L#8 "sdd"
> NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB)
> L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1)
> L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3)
> L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5)
> L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7)
> ----------------
>
> And I would like to use the physical numbering. To do that, I created a
> rankfile like this :
>
> rank 0=host1 slot=p0:0
> rank 1=host1 slot=p0:2
> rank 2=host1 slot=p0:4
> rank 3=host1 slot=p0:6
> rank 4=host1 slot=p1:1
> rank 5=host1 slot=p1:3
> rank 6=host1 slot=p1:5
> rank 7=host1 slot=p1:7
>
> But when I run my job with "*mpiexec -np 8 --rankfile rankfile ./foo*", I
> encounter this error :
>
> * Specified slot list: p0:4
> Error: Not found
>
> This could mean that a non-existent processor was specified, or
> that the specification had improper syntax.*
>
>
> Do you know what I did wrong?
>
> Best regards,
>
> François
>
> --
> ___________________
> François TESSIER
> PhD Student at University of Bordeaux
> Tel : 0033.5.24.57.41.52francois.tessier_at_[hidden]
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
| Teng Ma          Univ. of Tennessee |
| tma_at_[hidden]        Knoxville, TN |
| http://web.eecs.utk.edu/~tma/       |