Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Infinite Loop: ompi_free_list_wait
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-03-23 12:53:51


did you try it with OpenMPI 1.3.1 version?

There have been few changes and bug fixes (example r20591, fix in ob1 PML)
.

Lenny.

2009/3/23 Timothy Hayes <hayesti_at_[hidden]>

> Hello,
>
> I'm working on an OpenMPI BTL component and am having a recurring problem,
> I was wondering if anyone could shed some light on it. I have a component
> that's quite straight forward, it uses a pair of lightweight sockets to take
> advantage of being in a virtualised environment (specifically Xen). My code
> is a bit messy and has lots of inefficiencies, but the logic seems sound
> enough. I've been able to execute a few simple programs successfully using
> the component, and they work most of the time.
>
> The problem I'm having is actually happening in higher layers, specifically
> in my asynchronous receive handler, when I call the callback function
> (cbfunc) that was set by the PML in the BTL initialisation phase. It seems
> to be getting stuck in an infinite loop at __ompi_free_list_wait(), in this
> function there is a condition variable which should get set eventually but
> just doesn't. I've stepped through it with GDB and I get a backtrace of
> something like this:
>
> mca_btl_xen_endpoint_recv_handler -> mca_btl_xen_endpoint_start_recv ->
> mca_pml_ob1_recv_frag_callback -> mca_pml_ob1_recv_frag_match ->
> __ompi_free_list_wait -> opal_condition_wait
>
> and from there it just loops. Although this is happening in higher levels,
> I haven't noticed something like this happening in any of the other BTL
> components so chances are there's something in my code that's causing this.
> I very much doubt that it's actually waiting for a list item to be returned
> since this infinite loop can occur non deterministically and sometimes even
> on the first receive callback.
>
> I'm really not too sure what else to include with this e-mail. I could send
> my source code (a bit nasty right now) if it would be helpful, but I'm
> hoping that someone might have noticed this problem before or something
> similar. Maybe I'm making a common mistake. Any advice would be really
> appreciated!
>
> I'm using OpenMPI 1.2.9 from the SVN tag repository.
>
> Kind regards
> Tim Hayes
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>