Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] request help debugging openib btl problem
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-02-09 08:03:30


Two suggestions:

1. Have you tried the OMPI development trunk? (or do you have a need
for getting the 1.2 series working?) The use of OF verbs in the OMPI
development trunk has changed quite a bit since the 1.2 series; we're
well on our way towards a v1.3 release -- this is where all future
work is occurring.

2. You might want to focus on a single type of messaging at a time;
the openib BTL has three:
    - eager RDMA
    - send/receive
    - long RDMA
You might want to disable eager RDMA and long RDMA first, for example,
and just debug send/recv. Use the MCA params
btl_openib_use_eager_rdma=0 and btl_openib_flags=1 (that's a trunk
flag value; I don't remember offhand if it's the same for the v1.2
branch; use "ompi_info --param btl openib --parsable | grep flags:" to
see what the flag bit values are in that series).

Good luck.

On Feb 8, 2008, at 5:52 PM, Ralph Campbell wrote:

> I'm using openmpi 1.2.5 with a QLogic HCA and using the
> openib btl (not PSM). osu_latency and osu_bw work OK but
> when I run osu_bibw with a message size of 2MB (1<<21),
> it hangs in btl_openib_component_progress() waiting for something.
>
> I tried adding printfs at each point where ibv_post_send(),
> ibv_post_recv(), and ibv_poll_cq() are called and then ran
> a python script which verified that all sends and recvs got a
> good completion notice in the posted order
> (mca_btl_openib_component.use_srq is zero for this test)
> Note that only RC SEND (12252 byte) messages are being sent
> at this point.
>
> I can send the trace of ibv_* calls if it will help.
>
> Any suggestions what to look for are welcome.
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

-- 
Jeff Squyres
Cisco Systems