Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] SM initialization race condition
From: George Bosilca (bosilca_at_[hidden])
Date: 2008-08-21 09:38:11


bzero() function conforms to IEEE Std 1003.1-2001 (``POSIX.1'')

memset() function conforms to ISO/IEC 9899:1990 (``ISO C90'')

Both functions are in the libc, so it's definitively difficult to see
which one is better.

   george.

On Aug 21, 2008, at 3:32 PM, Jeff Squyres wrote:

> IIRC, bzero is a gnu-ism. We should probably use memset instead.
>
>
> On Aug 21, 2008, at 5:40 AM, George Bosilca wrote:
>
>> Terry,
>>
>> We use the feature defined by POSIX mmap where the area should be
>> zero-filled when the file length is extended. What OS you're using
>> when you see such problems ?
>>
>> Just in case, here is a patch that set the beginning of the mmaped
>> region to zero, in case this is not done automatically. As in most
>> cases this is an unnecessary overhead, we should find the cases
>> where we really need this, and possibly conditionally compile it.
>>
>> Index: ompi/mca/common/sm/common_sm_mmap.c
>> ===================================================================
>> --- ompi/mca/common/sm/common_sm_mmap.c (revision 19377)
>> +++ ompi/mca/common/sm/common_sm_mmap.c (working copy)
>> @@ -163,6 +163,7 @@
>>
>> /* initialize the segment - only the first process
>> to open the file */
>> + bzero( map->data_addr, size );
>> mem_offset = map->data_addr - (unsigned char *)map-
>> >map_seg;
>> map->map_seg->seg_offset = mem_offset;
>> map->map_seg->seg_size = size - mem_offset;
>>
>> george.
>>
>> On Aug 21, 2008, at 1:22 PM, Terry Dontje wrote:
>>
>>> I've been seeing an intermittent (once every 4 hours looping on a
>>> quick initialization program) segv with the following stack trace.
>>>
>>> =>[1] mca_btl_sm_add_procs(btl = 0xfffffd7ffdb67ef0, nprocs = 2U,
>>> procs = 0x591560, peers = 0x591580, reachability =
>>> 0xfffffd7fffdff000), line 519 in "btl_sm.c"
>>> [2] mca_bml_r2_add_procs(nprocs = 2U, procs = 0x591560,
>>> bml_endpoints = 0x591500, reachable = 0xfffffd7fffdff000), line
>>> 222 in "bml_r2.c"
>>> [3] mca_pml_ob1_add_procs(procs = 0x5914c0, nprocs = 2U), line 248
>>> in "pml_ob1.c"
>>> [4] ompi_mpi_init(argc = 1, argv = 0xfffffd7fffdff318, requested =
>>> 0, provided = 0xfffffd7fffdff234), line 651 in "ompi_mpi_init.c"
>>> [5] PMPI_Init(argc = 0xfffffd7fffdff2ec, argv =
>>> 0xfffffd7fffdff2e0), line 90 in "pinit.c"
>>> [6] main(argc = 1, argv = 0xfffffd7fffdff318), line 82 in "buffer.c"
>>>
>>> I believe the problem is that mca_btl_sm_component.shm_fifo[j]
>>> contains uninitialized data causes the loop on line 504 in
>>> btl_sm.c to think that a remote rank has set its fifo address.
>>>
>>> Has anyone else seen the above happening?
>>>
>>> --td
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



  • application/pkcs7-signature attachment: smime.p7s