Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI error
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-03 09:29:13


Have you tried this:

     http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion

On Feb 2, 2009, at 2:52 PM, c.j.kao_at_[hidden] wrote:

>
> I am using openmpi to run a job on 4 nodes, 2 processors per node.
> Seems
> like 5 out of the 8 processors executed the app successfully and 3
> of them
> did not. Here is the error message I got. The last thing I did in
> the code
> is an MPI_Barrier call and it never returns (probably because 3 of
> the
> processes never gets executed properly?)
>
> [0,1,7][btl_openib_component.c:1332:btl_openib_component_progress]
> from
> hplcnla160 to: hplcnla162 error polling HP CQ with status LOCAL LENGTH
> ERROR status number 1 for wr_id 6158264 opcode 0
>
> and here is the script I used:
>
> #!/bin/bash -debug
> #PBS -N mytest
> #PBS -l nodes=4:ppn=2,walltime=00:05:00,tpn=2
> #PBS -j oe
>
> NP=$(wc -l $PBS_NODEFILE | awk '{print $1}')
> /opt/openmpi-1.2.4/gnu/bin/mpirun -np $NP My_Executable
>
> Has anybody seen this kind of error before? Thanks.
>
> CJ
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems