Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] Segfault in odls_fork_local_procs() for some values of npersocket
From: nadia.derbey (Nadia.Derbey_at_[hidden])
Date: 2011-11-08 03:01:57


Hi,

In v1.5, when mpirun is called with both the "-bind-to-core" and
"-npersocket" options, and the npersocket value leads to less procs than
sockets allocated on one node, we get a segfault

Testing environment:
openmpi v1.5
2 nodes with 4 8-cores sockets each
mpirun -n 10 -bind-to-core -npersocket 2

I was expecting to get:
   . ranks 0-1 : node 0 - socket 0
   . ranks 2-3 : node 0 - socket 1
   . ranks 4-5 : node 0 - socket 2
   . ranks 6-7 : node 0 - socket 3
   . ranks 8-9 : node 1 - socket 0

Instead of that, everything worked fine on node 0, and I got a segfault
on node 1, with a stack that looks like:

[derbeyn_at_berlin18 ~]$ mpirun --host berlin18,berlin26 -n 10
-bind-to-core -npersocket 2 sleep 900
[berlin26:21531] *** Process received signal ***
[berlin26:21531] Signal: Floating point exception (8)
[berlin26:21531] Signal code: Integer divide-by-zero (1)
[berlin26:21531] Failing at address: 0x7fed13731d63
[berlin26:21531] [ 0] /lib64/libpthread.so.0(+0xf490) [0x7fed15327490]
[berlin26:21531]
[ 1] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x2d63) [0x7fed13731d63]
[berlin26:21531]
[ 2] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_odls_base_default_launch_local+0xaf3) [0x7fed15e1fe73]
[berlin26:21531]
[ 3] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/openmpi/mca_odls_default.so(+0x1d10) [0x7fed13730d10]
[berlin26:21531]
[ 4] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x3804d)
[0x7fed15e1004d]
[berlin26:21531]
[ 5] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon_cmd_processor+0x4aa) [0x7fed15e1209a]
[berlin26:21531]
[ 6] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(+0x74ee8)
[0x7fed15e4cee8]
[berlin26:21531]
[ 7] /home_nfs/derbeyn/DISTS/openmpi-v1.5/lib/libopen-rte.so.3(orte_daemon+0x8d8) [0x7fed15e0f268]
[berlin26:21531] [ 8] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
[0x4008c6]
[berlin26:21531] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd)
[0x7fed14fa7c9d]
[berlin26:21531] [10] /home_nfs/derbeyn/DISTS/openmpi-v1.5/bin/orted()
[0x400799]
[berlin26:21531] *** End of error message ***

The reason for this issue is that the npersocket value is taken into
account during the very first phase of mpirun (rmaps/load_balance) to
claim the slots on each node:
npersocket() (in rmaps/load_balance/rmaps_lb.c) claims
   . 8 slots on node 0 (4 sockets * 2 persocket)
   . 2 slots on node 1 (10 total ranks - 8 already claimed)

But when we come to odls_default_fork_local_proc() (in
odls/default/odls_default_module.c) npersocket is actually recomputed.
Everything works fine on node 0. But on node 1, we have:
   . jobdat->policy has both ORTE_BIND_TO_CORE and ORTE_MAPPING_NPERXXX
   . npersocket is recomputed the following way:
     npersocket = jobdat->num_local_procs/orte_odls_globals.num_sockets
                = 2 / 4 = 0
   . later on, when the starting point is computed:
     logical_cpu = (lrank % npersocket) * jobdat->cpus_per_rank;
     we get the divide-by-zero exception.

The problem comes, in my mind, from the fact we are recomputing the
npersocket on the local nodes instead of storing it in the jobdat
structure (as it is done today for the policy, the cpus_per_rank, the
stride,...).
Recomputing this value leads either to the segfault I got, or even to
wrong mappings: if we had had 4 slots claimed on node 1, the result
would have been 1 rank per socket (since we have 4-sockets nodes)
instead of 2 ranks on the first 2 sockets.

The attached patch is a fix proposal implementing my suggestion of
storing the npersocket into the jobdat.

This patch applies on v1.5. Waiting for your comments...

Regards,
Nadia

-- 
Nadia Derbey