Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error with multiple MPI runs inside one Slurm allocation (with QLogic PSM)
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2012-04-02 11:40:06


I'm not sure the 1.4 series can support that behavior. Each mpirun only knows about itself - it has no idea something else is going on.

If you attempted to bind, all procs of same rank from each run would bind on the same CPU.

All you can really do is use -host to tell the fourth run not to use the first node. Or use the devel trunk, which has more ability to separate runs.

Sent from my iPad

On Apr 2, 2012, at 6:53 AM, Rémi Palancher <remi_at_[hidden]> wrote:

> Hi there,
>
> I'm encountering a problem when trying to run multiple mpirun in parallel inside
> one SLURM allocation on multiple nodes using a QLogic interconnect network with
> PSM.
>
> I'm using Open MPI version 1.4.5 compiled with GCC 4.4.5 on Debian Lenny.
>
> My cluster is composed of 12 cores nodes.
>
> Here is how I'm able to reproduce the problem:
>
> Allocate 20 CPU on 2 nodes :
>
> frontend $ salloc -N 2 -n 20
> frontend $ srun hostname | sort | uniq -c
> 12 cn1381
> 8 cn1382
>
> My job allocates 12 CPU on node cn1381 and 8 CPU on cn1382.
>
> My test MPI program parse for each task the value of Cpus_allowed_list in file
> /proc/$PID/status and print it.
>
> If I run it on all 20 allocated CPU, it works well:
>
> frontend $ mpirun get-allowed-cpu-ompi 1
> Launch 1 Task 00 of 20 (cn1381): 0
> Launch 1 Task 01 of 20 (cn1381): 1
> Launch 1 Task 02 of 20 (cn1381): 2
> Launch 1 Task 03 of 20 (cn1381): 3
> Launch 1 Task 04 of 20 (cn1381): 4
> Launch 1 Task 05 of 20 (cn1381): 7
> Launch 1 Task 06 of 20 (cn1381): 5
> Launch 1 Task 07 of 20 (cn1381): 9
> Launch 1 Task 08 of 20 (cn1381): 8
> Launch 1 Task 09 of 20 (cn1381): 10
> Launch 1 Task 10 of 20 (cn1381): 6
> Launch 1 Task 11 of 20 (cn1381): 11
> Launch 1 Task 12 of 20 (cn1382): 4
> Launch 1 Task 13 of 20 (cn1382): 5
> Launch 1 Task 14 of 20 (cn1382): 6
> Launch 1 Task 15 of 20 (cn1382): 7
> Launch 1 Task 16 of 20 (cn1382): 8
> Launch 1 Task 17 of 20 (cn1382): 10
> Launch 1 Task 18 of 20 (cn1382): 9
> Launch 1 Task 19 of 20 (cn1382): 11
>
> Here you can see that Slurm gave me CPU 0-11 on cn1381 and 4-11 on cn1382.
>
> Now I'd like to run multiple MPI runs in parallel, 4 tasks each, inside my job.
>
> frontend $ cat params.txt
> 1
> 2
> 3
> 4
> 5
>
> It works well when I launch 3 runs in parallel, where it only use the 12 CPU of
> the first node (3 runs x 4 tasks = 12 CPU):
>
> frontend $ xargs -P 3 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt
> Launch 2 Task 00 of 04 (cn1381): 1
> Launch 2 Task 01 of 04 (cn1381): 2
> Launch 2 Task 02 of 04 (cn1381): 4
> Launch 2 Task 03 of 04 (cn1381): 7
> Launch 1 Task 00 of 04 (cn1381): 0
> Launch 1 Task 01 of 04 (cn1381): 3
> Launch 1 Task 02 of 04 (cn1381): 5
> Launch 1 Task 03 of 04 (cn1381): 6
> Launch 3 Task 00 of 04 (cn1381): 9
> Launch 3 Task 01 of 04 (cn1381): 8
> Launch 3 Task 02 of 04 (cn1381): 10
> Launch 3 Task 03 of 04 (cn1381): 11
> Launch 4 Task 00 of 04 (cn1381): 0
> Launch 4 Task 01 of 04 (cn1381): 3
> Launch 4 Task 02 of 04 (cn1381): 1
> Launch 4 Task 03 of 04 (cn1381): 5
> Launch 5 Task 00 of 04 (cn1381): 2
> Launch 5 Task 01 of 04 (cn1381): 4
> Launch 5 Task 02 of 04 (cn1381): 7
> Launch 5 Task 03 of 04 (cn1381): 6
>
> But when I try to launch 4 runs or more in parallel, where it needs to use the
> CPU of the other node as well, it fails:
>
> frontend $ $ xargs -P 4 -n 1 mpirun -n 4 get-allowed-cpu-ompi < params.txt
> cn1381.23245ipath_userinit: assign_context command failed: Network is down
> cn1381.23245can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> PSM was unable to open an endpoint. Please make sure that the network link is
> active on the node and the hardware is functioning.
>
> Error: Could not detect network connectivity
> --------------------------------------------------------------------------
> cn1381.23248ipath_userinit: assign_context command failed: Network is down
> cn1381.23248can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> PSM was unable to open an endpoint. Please make sure that the network link is
> active on the node and the hardware is functioning.
>
> Error: Could not detect network connectivity
> --------------------------------------------------------------------------
> cn1381.23247ipath_userinit: assign_context command failed: Network is down
> cn1381.23247can't open /dev/ipath, network down (err=26)
> cn1381.23249ipath_userinit: assign_context command failed: Network is down
> cn1381.23249can't open /dev/ipath, network down (err=26)
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [cn1381:23245] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [cn1381:23247] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> [cn1381:23242] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> [cn1381:23243] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 23245 on
> node cn1381 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> It looks like MPI_INIT failed for some reason; your parallel process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Error" (-1) instead of "Success" (0)
> --------------------------------------------------------------------------
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [cn1381:23246] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [cn1381:23248] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> *** The MPI_Init() function was called before MPI_INIT was invoked.
> *** This is disallowed by the MPI standard.
> *** Your MPI job will now abort.
> [cn1381:23249] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> [cn1381:23244] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 2 with PID 23248 on
> node cn1381 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
> --------------------------------------------------------------------------
> [ivanoe1:24981] 3 more processes have sent help message help-mtl-psm.txt / unable to open endpoint
> [ivanoe1:24981] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> [ivanoe1:24981] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
> [ivanoe1:24983] 3 more processes have sent help message help-mtl-psm.txt / unable to open endpoint
> [ivanoe1:24983] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
> [ivanoe1:24983] 3 more processes have sent help message help-mpi-runtime / mpi_init:startup:internal-failure
> Launch 3 Task 00 of 04 (cn1381): 0
> Launch 3 Task 01 of 04 (cn1381): 1
> Launch 3 Task 02 of 04 (cn1381): 2
> Launch 3 Task 03 of 04 (cn1381): 3
> Launch 1 Task 00 of 04 (cn1381): 4
> Launch 1 Task 01 of 04 (cn1381): 5
> Launch 1 Task 02 of 04 (cn1381): 6
> Launch 1 Task 03 of 04 (cn1381): 8
> Launch 5 Task 00 of 04 (cn1381): 7
> Launch 5 Task 01 of 04 (cn1381): 9
> Launch 5 Task 02 of 04 (cn1381): 10
> Launch 5 Task 03 of 04 (cn1381): 11
>
> As far as I can understand, Open MPI tries to launch all runs on the same nodes
> (cn1382 in my case) and it forgets about the other node. Am I right? How can I
> avoid this behaviour?
>
> Here are the Open MPI variables set in my environment:
> $ env | grep OMPI
> OMPI_MCA_mtl=psm
> OMPI_MCA_pml=cm
>
> You can find attached to this email the config.log and the output of the
> following commands:
> frontend $ ompi_info --all > ompi_info_all.txt
> frontend $ mpirun --bynode --npernode 1 --tag-output ompi_info -v ompi full \
> --parsable > ompi_nodes.txt
>
> Thanks in advance for any kind of help!
>
> Best regards,
> --
> Rémi Palancher
> http://rezib.org
> <config.log.gz>
> <ompi_info_all.txt.gz>
> <ompi_nodes.txt>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users