Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with progress thread and orte
From: Sangamesh B (forum.san_at_[hidden])
Date: 2010-01-12 00:31:26


Hi,

    What are the advantages with progress-threads feature?

Thanks,
Sangamesh

On Fri, Jan 8, 2010 at 10:13 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Yeah, the system doesn't currently support enable-progress-threads. It is a
> two-fold problem: ORTE won't work that way, and some parts of the MPI layer
> won't either.
>
> I am currently working on fixing ORTE so it will work with progress threads
> enabled. I believe (but can't confirm) that the TCP BTL will also work with
> that feature, but I have heard that the other BTL's won't (again, can't
> confirm).
>
> I'll send out a note when ORTE is okay, but that won't be included in a
> release for awhile.
>
> On Jan 8, 2010, at 9:38 AM, Dong Li wrote:
>
> > Hi, guys.
> > My application got stuck when I run an application with Open MPI 1.4
> > with progress thead enabled.
> >
> > The OpenMPI is configured and compiled with the following options.
> > ./configure --with-openib=/usr --enable-trace --enable-debug
> > --enable-peruse --enable-progress-threads
> >
> > Then I started the application with two MPI processes, but it looks
> > like there is some problem with orte and the mpiexec just stuck there
> > and never run the application.
> > I used gdb to attach to the mpiexec to find out where the program got
> > stuck. The backtrace information is shown in the following for the two
> > MPI progresses (i.e. the rank 0 and the rank 1). It looks to me that
> > the problem happened in the rank 0 when it tries to do some atomic add
> > operation. Note that my processor is Intel Xeon CPU E5462, but the
> > open mpi tried to use some AMD64 instructions to conduct atomic add
> > operations. Is this a bug or something?
> >
> > Any comment? Thank you.
> >
> > -Dong
> >
> >
> >
> ***********************************************************************************************************************************************
> > The following is for the rank 0.
> > (gdb) bt
> > #0 0x00007fbdd1c93264 in opal_atomic_cmpset_32 (addr=0x7fbdd1eede24,
> > oldval=1, newval=0) at ../opal/include/opal/sys/amd64/atomic.h:94
> > #1 0x00007fbdd1c93348 in opal_atomic_add_xx (addr=0x7fbdd1eede24,
> > value=1, length=4) at ../opal/include/opal/sys/atomic_impl.h:243
> > #2 0x00007fbdd1c932ad in opal_progress () at runtime/opal_progress.c:171
> > #3 0x00007fbdd1f5c9ad in orte_plm_base_daemon_callback
> > (num_daemons=1) at base/plm_base_launch_support.c:459
> > #4 0x00007fbdd0a5579d in orte_plm_rsh_launch (jdata=0x60f070) at
> > plm_rsh_module.c:1221
> > #5 0x0000000000403821 in orterun (argc=15, argv=0x7fffda18a498) at
> > orterun.c:748
> > #6 0x0000000000402dc7 in main (argc=15, argv=0x7fffda18a498) at
> main.c:13
> >
> ************************************************************************************************************************************************
> > The following is for the rank 1.
> > #0 0x0000003c4c20b309 in pthread_cond_wait@@GLIBC_2.3.2 () from
> > /lib64/libpthread.so.0
> > #1 0x00007f6f8b04ba56 in opal_condition_wait (c=0x656ce0, m=0x656c88)
> > at ../../../../opal/threads/condition.h:78
> > #2 0x00007f6f8b04b8b7 in orte_rml_oob_send (peer=0x7f6f8c578978,
> > iov=0x7fff945798d0, count=1, tag=10, flags=16) at rml_oob_send.c:153
> > #3 0x00007f6f8b04c197 in orte_rml_oob_send_buffer
> > (peer=0x7f6f8c578978, buffer=0x6563b0, tag=10, flags=0) at
> > rml_oob_send.c:269
> > #4 0x00007f6f8c32fe24 in orte_daemon (argc=28, argv=0x7fff9457abd8)
> > at orted/orted_main.c:610
> > #5 0x0000000000400917 in main (argc=28, argv=0x7fff9457abd8) at
> orted.c:62
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>