Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] now 1.9 [was: I have still a problem with rankfiles in openmpi-1.6.4rc3]
From: Eugene Loh (eugene.loh_at_[hidden])
Date: 2013-02-06 17:59:56


On 02/06/13 04:29, Siegmar Gross wrote:
>
> thank you very much for your answer. I have compiled your program
> and get different behaviours for openmpi-1.6.4rc3 and openmpi-1.9.
>
> I get the following output for openmpi-1.9 (different outputs !!!).
>
> sunpc1 rankfiles 104 mpirun --report-bindings --rankfile myrankfile ./a.out
> [sunpc1:26554] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket 0[core 1[hwt 0]]: [B/B][./.]
> unbound
>
> sunpc1 rankfiles 105 mpirun --report-bindings --rankfile myrankfile_0 ./a.out
> [sunpc1:26557] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.]
> bind to 0

I think what's happening is that although you specified "0:0" or "0:1" in the rankfile, the string "0,0" or "0,1" is getting passed
in (at least in the runs I looked at). That colon became a comma. So, it's just by accident that myrankfile_0 is working out all
right.

Could someone who knows the code better than I do help me narrow this down? E.g., where is the rankfile parsed? For what it's
worth, by the time mpirun reaches orte_odls_base_default_get_add_procs_data(), orte_job_data already contains the corrupted
cpu_bitmap string.