Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem mpi
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-06-26 07:13:47


Sounds like you have a problem with the physical layer of your InfiniBand. You should run layer 0 diagnostics and/or contact your IB vendor for assistance.

On Jun 24, 2014, at 4:48 AM, Diego Saúl Carrió Carrió <diego.carrio_at_[hidden]> wrote:

> Dear all,
>
> I have problems for a long time related with mpirun. When I executed mpirun (with my program) I obtained the next error after a while:
>
> .
> .
> .
> .
> .
>
> mlx4: local QP operation err (QPN c00054, WQE index a0000, vendor syndrome 6f, opcode = 5e)
> [[64826,1],0][btl_openib_component.c:3497:handle_wc] from foner109 to: foner111 error polling LP CQ with status LOCAL QP OPERATION ERROR status number 2 for wr_id af58a8 opcode 128 vendor error 111 qp_idx 3
>
> mpirun has exited due to process rank 0 with PID 51754 on
> node foner109 exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
>
>
> I am using a cluster (42 nodes, with 20 processors and 64 Gb RAM for each one). I want to use for example only 20 nodes, so I put:
>
> salloc -N20 --tasks-per-node=1 --cpus-per-task=20 -p thin(name of the node)
>
> mpirun -pernode [my_program]
>
>
> Could you help me to solve this problem?
>
> Best Regards,
> Diego
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24692.php

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/