Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] divide-by-zero in mca_btl_openib_add_procs
From: Alain Miniussi (alain.miniussi_at_[hidden])
Date: 2014-05-27 07:49:11


So it's working with a gcc compiled openmpi:

[alainm_at_gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme
gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath
-Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags
-L/softs/openmpi-1.8.1-gnu447/lib -lmpi
(reverse-i-search)`mpicc': ^Cicc --showme:compile
[alainm_at_gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc --showme
gcc -I/softs/openmpi-1.8.1-gnu447/include -pthread -Wl,-rpath
-Wl,/softs/openmpi-1.8.1-gnu447/lib -Wl,--enable-new-dtags
-L/softs/openmpi-1.8.1-gnu447/lib -lmpi
[alainm_at_gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpicc ./test.c
[alainm_at_gurney mpi]$ /softs/openmpi-1.8.1-gnu447/bin/mpiexec -n 2 ./a.out
[alainm_at_gurney mpi]$ ldd ./a.out
     linux-vdso.so.1 => (0x00007fffb47ff000)
     libmpi.so.1 => /softs/openmpi-1.8.1-gnu447/lib/libmpi.so.1
(0x00002aaee80c1000)
     libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003bd9e00000)
     libc.so.6 => /lib64/libc.so.6 (0x0000003bd9200000)
     libopen-rte.so.7 =>
/softs/openmpi-1.8.1-gnu447/lib/libopen-rte.so.7 (0x00002aaee83b8000)
     libopen-pal.so.6 =>
/softs/openmpi-1.8.1-gnu447/lib/libopen-pal.so.6 (0x00002aaee8630000)
     libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x0000003bd9600000)
     libdl.so.2 => /lib64/libdl.so.2 (0x00002aaee8904000)
     librt.so.1 => /lib64/librt.so.1 (0x0000003bda600000)
     libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003beb000000)
     libutil.so.1 => /lib64/libutil.so.1 (0x0000003bea000000)
     libm.so.6 => /lib64/libm.so.6 (0x0000003bd9a00000)
     /lib64/ld-linux-x86-64.so.2 (0x0000003bd8e00000)
[alainm_at_gurney mpi]$ ./a.out
[alainm_at_gurney mpi]$

So it seems to be specific to Intel's compiler.

On 26/05/2014 17:35, Ralph Castain wrote:
> If you wouldn't mind, yes - let's see if it is a problem with icc. We know some versions have bugs, though this may not be the issue here
>
> On May 26, 2014, at 7:39 AM, Alain Miniussi <alain.miniussi_at_oca.eu> wrote:
>
>> Hi,
>>
>> Did that too, with the same result:
>>
>> [alainm_at_tagir mpi]$ mpirun -n 1 ./a.out
>> [tagir:05123] *** Process received signal ***
>> [tagir:05123] Signal: Floating point exception (8)
>> [tagir:05123] Signal code: Integer divide-by-zero (1)
>> [tagir:05123] Failing at address: 0x2adb507b3d9f
>> [tagir:05123] [ 0] /lib64/libpthread.so.0[0x30f920f710]
>> [tagir:05123] [ 1] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0xe9f)[0x2adb507b3d9f]
>> [tagir:05123] [ 2] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_bml_r2.so(+0x1481)[0x2adb505a7481]
>> [tagir:05123] [ 3] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xa8)[0x2adb51af02f8]
>> [tagir:05123] [ 4] /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(ompi_mpi_init+0x9f6)[0x2adb4b78b236]
>> [tagir:05123] [ 5] /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(MPI_Init+0xef)[0x2adb4b7ad74f]
>> [tagir:05123] [ 6] ./a.out[0x400dd1]
>> [tagir:05123] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30f8a1ed1d]
>> [tagir:05123] [ 8] ./a.out[0x400cc9]
>> [tagir:05123] *** End of error message ***
>> --------------------------------------------------------------------------
>> mpirun noticed that process rank 0 with PID 5123 on node tagir exited on signal 13 (Broken pipe).
>> --------------------------------------------------------------------------
>> [alainm_at_tagir mpi]$
>>
>>
>> do you want me to try a gcc build ?
>>
>> Alain
>>
>> On 26/05/2014 16:09, Ralph Castain wrote:
>>> Strange - I note that you are running these as singletons. Can you try running it under mpirun?
>>>
>>> mpirun -n 1 ./a.out
>>>
>>> just to see if it is the singleton that is causing the problem, or something in the openib btl itself.
>>>
>>>
>>> On May 26, 2014, at 6:59 AM, Alain Miniussi <alain.miniussi_at_oca.eu> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a failure with the following minimalistic testcase:
>>>> $: more ./test.c
>>>> #include "mpi.h"
>>>>
>>>> int main(int argc, char* argv[]) {
>>>> MPI_Init(&argc,&argv);
>>>> MPI_Finalize();
>>>> return 0;
>>>> }
>>>> $: mpicc -v
>>>> icc version 13.1.1 (gcc version 4.4.7 compatibility)
>>>> $: mpicc ./test.c
>>>> $: ./a.out
>>>> [tagir:02855] *** Process received signal ***
>>>> [tagir:02855] Signal: Floating point exception (8)
>>>> [tagir:02855] Signal code: Integer divide-by-zero (1)
>>>> [tagir:02855] Failing at address: 0x2aef6e5b2d9f
>>>> [tagir:02855] [ 0] /lib64/libpthread.so.0[0x30f920f710]
>>>> [tagir:02855] [ 1] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_btl_openib.so(mca_btl_openib_add_procs+0xe9f)[0x2aef6e5b2d9f]
>>>> [tagir:02855] [ 2] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_bml_r2.so(+0x1481)[0x2aef6e3a6481]
>>>> [tagir:02855] [ 3] /softs/openmpi-1.8.1-intel13/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xa8)[0x2aef6f8ef2f8]
>>>> [tagir:02855] [ 4] /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(ompi_mpi_init+0x9f6)[0x2aef69572236]
>>>> [tagir:02855] [ 5] /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1(MPI_Init+0xef)[0x2aef6959474f]
>>>> [tagir:02855] [ 6] ./a.out[0x400dd1]
>>>> [tagir:02855] [ 7] /lib64/libc.so.6(__libc_start_main+0xfd)[0x30f8a1ed1d]
>>>> [tagir:02855] [ 8] ./a.out[0x400cc9]
>>>> [tagir:02855] *** End of error message ***
>>>> $:
>>>>
>>>> Versions info:
>>>> $: mpicc -v
>>>> icc version 13.1.1 (gcc version 4.4.7 compatibility)
>>>> $: ldd ./a.out
>>>> linux-vdso.so.1 => (0x00007fffbb197000)
>>>> libmpi.so.1 => /softs/openmpi-1.8.1-intel13/lib/libmpi.so.1 (0x00002b20262ee000)
>>>> libm.so.6 => /lib64/libm.so.6 (0x00000030f8e00000)
>>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000030ff200000)
>>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00000030f9200000)
>>>> libc.so.6 => /lib64/libc.so.6 (0x00000030f8a00000)
>>>> libdl.so.2 => /lib64/libdl.so.2 (0x00000030f9600000)
>>>> libopen-rte.so.7 => /softs/openmpi-1.8.1-intel13/lib/libopen-rte.so.7 (0x00002b202660d000)
>>>> libopen-pal.so.6 => /softs/openmpi-1.8.1-intel13/lib/libopen-pal.so.6 (0x00002b20268a1000)
>>>> libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x00002b2026ba6000)
>>>> librt.so.1 => /lib64/librt.so.1 (0x00000030f9e00000)
>>>> libnsl.so.1 => /lib64/libnsl.so.1 (0x0000003109800000)
>>>> libutil.so.1 => /lib64/libutil.so.1 (0x000000310aa00000)
>>>> libimf.so => /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libimf.so (0x00002b2026db0000)
>>>> libsvml.so => /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libsvml.so (0x00002b202726d000)
>>>> libirng.so => /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libirng.so (0x00002b2027c37000)
>>>> libintlc.so.5 => /softs/intel/composer_xe_2013.3.163/compiler/lib/intel64/libintlc.so.5 (0x00002b2027e3e000)
>>>> /lib64/ld-linux-x86-64.so.2 (0x00000030f8600000)
>>>> $:
>>>>
>>>> I tried to goole the issue, and saw something regarding an old vectorization bug with intel compiler, but that was a lonng time ago and seemed to be fixed for 1.6.x.
>>>> Also, "make check" went fine ???
>>>>
>>>> Any idea ?
>>>>
>>>> Cheers
>>>>
>>>> --
>>>> ---
>>>> Alain
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> ---
>> Alain
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
---
Alain