Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Error with multiple MPI runs inside one Slurm allocation (with QLogic PSM)
From: Rémi Palancher (remi_at_[hidden])
Date: 2012-04-02 08:53:02


 Hi there,

 I'm encountering a problem when trying to run multiple mpirun in
 parallel inside
 one SLURM allocation on multiple nodes using a QLogic interconnect
 network with
 PSM.

 I'm using Open MPI version 1.4.5 compiled with GCC 4.4.5 on Debian
 Lenny.

 My cluster is composed of 12 cores nodes.

 Here is how I'm able to reproduce the problem:

 Allocate 20 CPU on 2 nodes :

 frontend $ salloc -N 2 -n 20
 frontend $ srun hostname | sort | uniq -c
      12 cn1381
       8 cn1382

 My job allocates 12 CPU on node cn1381 and 8 CPU on cn1382.

 My test MPI program parse for each task the value of Cpus_allowed_list
 in file
 /proc/$PID/status and print it.

 If I run it on all 20 allocated CPU, it works well:

 frontend $ mpirun get-allowed-cpu-ompi 1
 Launch 1 Task 00 of 20 (cn1381): 0
 Launch 1 Task 01 of 20 (cn1381): 1
 Launch 1 Task 02 of 20 (cn1381): 2
 Launch 1 Task 03 of 20 (cn1381): 3
 Launch 1 Task 04 of 20 (cn1381): 4
 Launch 1 Task 05 of 20 (cn1381): 7
 Launch 1 Task 06 of 20 (cn1381): 5
 Launch 1 Task 07 of 20 (cn1381): 9
 Launch 1 Task 08 of 20 (cn1381): 8
 Launch 1 Task 09 of 20 (cn1381): 10
 Launch 1 Task 10 of 20 (cn1381): 6
 Launch 1 Task 11 of 20 (cn1381): 11
 Launch 1 Task 12 of 20 (cn1382): 4
 Launch 1 Task 13 of 20 (cn1382): 5
 Launch 1 Task 14 of 20 (cn1382): 6
 Launch 1 Task 15 of 20 (cn1382): 7
 Launch 1 Task 16 of 20 (cn1382): 8
 Launch 1 Task 17 of 20 (cn1382): 10
 Launch 1 Task 18 of 20 (cn1382): 9
 Launch 1 Task 19 of 20 (cn1382): 11

 Here you can see that Slurm gave me CPU 0-11 on cn1381 and 4-11 on
 cn1382.

 Now I'd like to run multiple MPI runs in parallel, 4 tasks each, inside
 my job.

 frontend $ cat params.txt
 1
 2
 3
 4
 5

 It works well when I launch 3 runs in parallel, where it only use the
 12 CPU of
 the first node (3 runs x 4 tasks = 12 CPU):

 frontend $ xargs -P 3 -n 1 mpirun -n 4 get-allowed-cpu-ompi <
 params.txt
 Launch 2 Task 00 of 04 (cn1381): 1
 Launch 2 Task 01 of 04 (cn1381): 2
 Launch 2 Task 02 of 04 (cn1381): 4
 Launch 2 Task 03 of 04 (cn1381): 7
 Launch 1 Task 00 of 04 (cn1381): 0
 Launch 1 Task 01 of 04 (cn1381): 3
 Launch 1 Task 02 of 04 (cn1381): 5
 Launch 1 Task 03 of 04 (cn1381): 6
 Launch 3 Task 00 of 04 (cn1381): 9
 Launch 3 Task 01 of 04 (cn1381): 8
 Launch 3 Task 02 of 04 (cn1381): 10
 Launch 3 Task 03 of 04 (cn1381): 11
 Launch 4 Task 00 of 04 (cn1381): 0
 Launch 4 Task 01 of 04 (cn1381): 3
 Launch 4 Task 02 of 04 (cn1381): 1
 Launch 4 Task 03 of 04 (cn1381): 5
 Launch 5 Task 00 of 04 (cn1381): 2
 Launch 5 Task 01 of 04 (cn1381): 4
 Launch 5 Task 02 of 04 (cn1381): 7
 Launch 5 Task 03 of 04 (cn1381): 6

 But when I try to launch 4 runs or more in parallel, where it needs to
 use the
 CPU of the other node as well, it fails:

 frontend $ $ xargs -P 4 -n 1 mpirun -n 4 get-allowed-cpu-ompi <
 params.txt
 cn1381.23245ipath_userinit: assign_context command failed: Network is
 down
 cn1381.23245can't open /dev/ipath, network down (err=26)
 --------------------------------------------------------------------------
 PSM was unable to open an endpoint. Please make sure that the network
 link is
 active on the node and the hardware is functioning.

   Error: Could not detect network connectivity
 --------------------------------------------------------------------------
 cn1381.23248ipath_userinit: assign_context command failed: Network is
 down
 cn1381.23248can't open /dev/ipath, network down (err=26)
 --------------------------------------------------------------------------
 PSM was unable to open an endpoint. Please make sure that the network
 link is
 active on the node and the hardware is functioning.

   Error: Could not detect network connectivity
 --------------------------------------------------------------------------
 cn1381.23247ipath_userinit: assign_context command failed: Network is
 down
 cn1381.23247can't open /dev/ipath, network down (err=26)
 cn1381.23249ipath_userinit: assign_context command failed: Network is
 down
 cn1381.23249can't open /dev/ipath, network down (err=26)
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort. There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or
 environment
 problems. This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):

   PML add procs failed
   --> Returned "Error" (-1) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [cn1381:23245] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [cn1381:23247] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 [cn1381:23242] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 [cn1381:23243] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 --------------------------------------------------------------------------
 mpirun has exited due to process rank 2 with PID 23245 on
 node cn1381 exiting without calling "finalize". This may
 have caused other processes in the application to be
 terminated by signals sent by mpirun (as reported here).
 --------------------------------------------------------------------------
 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort. There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or
 environment
 problems. This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):

   PML add procs failed
   --> Returned "Error" (-1) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [cn1381:23246] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [cn1381:23248] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 *** The MPI_Init() function was called before MPI_INIT was invoked.
 *** This is disallowed by the MPI standard.
 *** Your MPI job will now abort.
 [cn1381:23249] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 [cn1381:23244] Abort before MPI_INIT completed successfully; not able
 to guarantee that all other processes were killed!
 --------------------------------------------------------------------------
 mpirun has exited due to process rank 2 with PID 23248 on
 node cn1381 exiting without calling "finalize". This may
 have caused other processes in the application to be
 terminated by signals sent by mpirun (as reported here).
 --------------------------------------------------------------------------
 [ivanoe1:24981] 3 more processes have sent help message
 help-mtl-psm.txt / unable to open endpoint
 [ivanoe1:24981] Set MCA parameter "orte_base_help_aggregate" to 0 to
 see all help / error messages
 [ivanoe1:24981] 3 more processes have sent help message
 help-mpi-runtime / mpi_init:startup:internal-failure
 [ivanoe1:24983] 3 more processes have sent help message
 help-mtl-psm.txt / unable to open endpoint
 [ivanoe1:24983] Set MCA parameter "orte_base_help_aggregate" to 0 to
 see all help / error messages
 [ivanoe1:24983] 3 more processes have sent help message
 help-mpi-runtime / mpi_init:startup:internal-failure
 Launch 3 Task 00 of 04 (cn1381): 0
 Launch 3 Task 01 of 04 (cn1381): 1
 Launch 3 Task 02 of 04 (cn1381): 2
 Launch 3 Task 03 of 04 (cn1381): 3
 Launch 1 Task 00 of 04 (cn1381): 4
 Launch 1 Task 01 of 04 (cn1381): 5
 Launch 1 Task 02 of 04 (cn1381): 6
 Launch 1 Task 03 of 04 (cn1381): 8
 Launch 5 Task 00 of 04 (cn1381): 7
 Launch 5 Task 01 of 04 (cn1381): 9
 Launch 5 Task 02 of 04 (cn1381): 10
 Launch 5 Task 03 of 04 (cn1381): 11

 As far as I can understand, Open MPI tries to launch all runs on the
 same nodes
 (cn1382 in my case) and it forgets about the other node. Am I right?
 How can I
 avoid this behaviour?

 Here are the Open MPI variables set in my environment:
 $ env | grep OMPI
 OMPI_MCA_mtl=psm
 OMPI_MCA_pml=cm

 You can find attached to this email the config.log and the output of
 the
 following commands:
 frontend $ ompi_info --all > ompi_info_all.txt
 frontend $ mpirun --bynode --npernode 1 --tag-output ompi_info -v ompi
 full \
            --parsable > ompi_nodes.txt

 Thanks in advance for any kind of help!

 Best regards,

-- 
 Rémi Palancher
 http://rezib.org