Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] BW benchmark hangs after r 18551
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-06-17 07:10:06


Lenny,

I guess you're running the latest version. If not, please update,
Galen and myself corrected some bugs last week. If you're using the
latest (and greatest) then ... well I imagine there is at least one
bug left.

There is a quick test you can do. In the btl_sm.c in the module
structure at the beginning of the file, please replace the sendi
function by NULL. If this fix the problem, then at least we know that
it's a sm send immediate problem.

   Thanks,
     george.

On Jun 17, 2008, at 7:54 AM, Lenny Verkhovsky wrote:

> Hi, George,
>
> I have a problem running BW benchmark on 100 rank cluster after
> r18551.
> The BW is mpi_p that runs mpi_bandwidth with 100K between all pairs.
>
>
> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18549 -t bw -s 100000
> BW (100) (size min max avg) 100000 576.734030
> 2001.882416 1062.698408
> #mpirun -np 100 -hostfile hostfile_w ./mpi_p_18551 -t bw -s 100000
> mpirun: killing job...
> ( it hangs even after 10 hours ).
>
>
> It doesn't happen if I run --bynode or btl openib,self only.
>
>
> Lenny.



  • application/pkcs7-signature attachment: smime.p7s