Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] RFC: libevent update
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-03-18 22:11:24


Commit 17872 is the one you're looking for.

https://svn.open-mpi.org/trac/ompi/changeset/17872

george.

On Mar 18, 2008, at 9:12 PM, Jeff Squyres wrote:

> When did you fix it? I merged the trunk down to the libevent-merge
> branch late this afternoon (r17869).
>
>
> On Mar 18, 2008, at 7:29 PM, George Bosilca wrote:
>
>> This has been fixed in the trunk, but not yet merged in the branch.
>>
>> george.
>>
>> On Mar 18, 2008, at 7:17 PM, Josh Hursey wrote:
>>
>>> I found another problem with the libevent branch.
>>>
>>> If I set "-mca btl tcp,self" on the command line then I get a
>>> segfult
>>> when sending messages > 16 KB. I can try to make a smaller repeater,
>>> but if you use the "progress" or "simple" tests in ompi-tests below:
>>> https://svn.open-mpi.org/svn/ompi-tests/trunk/iu/ft/correctness
>>>
>>> To build:
>>> shell$ make
>>> To run with failure:
>>> shell$ mpirun -np 2 -mca btl tcp,self progress -s 16 -v 1
>>> To run without failure:
>>> shell$ mpirun -np 2 -mca btl tcp,self progress -s 15 -v 1
>>>
>>> This program will display the message "Checkpoint at any time...".
>>> If
>>> you send mpirun SIGUSR2 it will progress to the next stage of the
>>> test. The failure occurs when the first message before this becomes
>>> an issue though.
>>>
>>> I was using Odin, and if I do not specify the btls then the test
>>> will
>>> pass as normal.
>>>
>>> The backtrace is below:
>>> ------------------------------------------
>>> ...
>>> Core was generated by `progress -s 16 -v 1'.
>>> Program terminated with signal 11, Segmentation fault.
>>> #0 0x0000002a9793318b in mca_bml_base_free
>>> (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/
>>> bml/bml.h:267
>>> 267 bml_btl->btl_free( bml_btl->btl, des );
>>> (gdb) bt
>>> #0 0x0000002a9793318b in mca_bml_base_free
>>> (bml_btl=0x736275705f61636d, des=0x559700) at ../../../../ompi/mca/
>>> bml/bml.h:267
>>> #1 0x0000002a9793304d in mca_pml_ob1_put_completion (btl=0x5598c0,
>>> ep=0x0, des=0x559700, status=0) at pml_ob1_recvreq.c:190
>>> #2 0x0000002a97930069 in mca_pml_ob1_recv_frag_callback
>>> (btl=0x5598c0, tag=64 '@', des=0x2a989d2b00, cbdata=0x0) at
>>> pml_ob1_recvfrag.c:149
>>> #3 0x0000002a97d5f3e0 in mca_btl_tcp_endpoint_recv_handler (sd=10,
>>> flags=2, user=0x5a5df0) at btl_tcp_endpoint.c:696
>>> #4 0x0000002a95a0ab93 in event_process_active (base=0x508c80) at
>>> event.c:591
>>> #5 0x0000002a95a0af59 in opal_event_base_loop (base=0x508c80,
>>> flags=2) at event.c:763
>>> #6 0x0000002a95a0ad2b in opal_event_loop (flags=2) at event.c:670
>>> #7 0x0000002a959fadf8 in opal_progress () at runtime/
>>> opal_progress.c:
>>> 169
>>> #8 0x0000002a9792caae in opal_condition_wait (c=0x2a9587d940,
>>> m=0x2a9587d9c0) at ../../../../opal/threads/condition.h:93
>>> #9 0x0000002a9792c9dd in ompi_request_wait_completion
>>> (req=0x5a5380)
>>> at ../../../../ompi/request/request.h:381
>>> #10 0x0000002a9792c920 in mca_pml_ob1_recv (addr=0x5baf70,
>>> count=16384, datatype=0x503770, src=1, tag=1001, comm=0x5039a0,
>>> status=0x0)
>>> at pml_ob1_irecv.c:104
>>> #11 0x0000002a956f1f00 in PMPI_Recv (buf=0x5baf70, count=16384,
>>> type=0x503770, source=1, tag=1001, comm=0x5039a0, status=0x0) at
>>> precv.c:75
>>> #12 0x000000000040211f in exchange_stage1 (ckpt_num=1) at
>>> progress.c:414
>>> #13 0x0000000000401295 in main (argc=5, argv=0x7fbfffe668) at
>>> progress.c:131
>>> (gdb) p bml_btl
>>> $1 = (mca_bml_base_btl_t *) 0x736275705f61636d
>>> (gdb) p *bml_btl
>>> Cannot access memory at address 0x736275705f61636d
>>> ------------------------------------------
>>>
>>> -- Josh
>>>
>>> On Mar 17, 2008, at 2:50 PM, Jeff Squyres wrote:
>>>
>>>> WHAT: Bring new version of libevent to the trunk.
>>>>
>>>> WHY: Newer version, slightly better performance (lower overheads /
>>>> lighter weight), properly integrate the use of epoll and other
>>>> scalable fd monitoring mechanisms.
>>>>
>>>> WHERE: 98% of the changes are in opal/event; there's a few changes
>>>> to
>>>> configury and one change to the orted.
>>>>
>>>> TIMEOUT: COB, Friday, 21 March 2008
>>>>
>>>> DESCRIPTION:
>>>>
>>>> George/UTK has done the bulk of the work to integrate a new
>>>> version of
>>>> libevent on the following tmp branch:
>>>>
>>>> https://svn.open-mpi.org/svn/ompi/tmp-public/libevent-merge
>>>>
>>>> ** WE WOULD VERY MUCH APPRECIATE IF PEOPLE COULD MTT TEST THIS
>>>> BRANCH!
>>>> **
>>>>
>>>> Cisco ran MTT on this branch on Friday and everything checked out
>>>> (i.e., no more failures than on the trunk). We just made a few
>>>> more
>>>> minor changes today and I'm running MTT again now, but I'm not
>>>> expecting any new failures (MTT will take several hours). We would
>>>> like to bring the new libevent in over this upcoming weekend, but
>>>> would very much appreciate if others could test on their platforms
>>>> (Cisco tests mainly 64 bit RHEL4U4). This new libevent *should*
>>>> be a
>>>> fairly side-effect free change, but it is possible that since we're
>>>> now using epoll and other scalable fd monitoring tools, we'll run
>>>> into
>>>> some unanticipated issues on some platforms.
>>>>
>>>> Here's a consolidated diff if you want to see the changes:
>>>>
>>>> https://svn.open-mpi.org/trac/ompi/changeset?old_path=tmp-public%
>>>> 2Flibevent-merge&old=17846&new_path=trunk&new=17842
>>>>
>>>> Thanks.
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s