Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] ticket #1469
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2008-09-11 11:40:52


now seems to be fixed with r19538.

On 9/10/08, Ralph Castain <rhc_at_[hidden]> wrote:
>
> I'm sorry - I can't even make sense of this. If you think you can reproduce
> it, then you are welcome to fix it. I cannot reproduce it, and hence can do
> nothing further about it.
>
> Ralph
>
>
> On Sep 10, 2008, at 2:01 AM, Lenny Verkhovsky wrote:
>
> Hi Ralph,
>>
>> I can recreate this failure, I think it caused by the fact that we do not
>> open orted on the last node( also I didnt check it ), since np < number of
>> hosts.
>>
>> I used the falowing configure line ../configure
>> --prefix=/home/USERS/lenny/OMPI_ORTE_TRUNK
>>
>> on OMPI 1.4a1r19522
>> Hope it helped.
>>
>> #mpirun -np 3 -H witch2 ./spawn_multiple
>> Parent: 1 of 3, witch2 (1 in init)
>> Parent: 0 of 3, witch2 (1 in init)
>> Parent: 2 of 3, witch2 (1 in init)
>> #mpirun -np 3 -H witch2,witch3 ./spawn_multiple
>> Parent: 0 of 3, witch2 (0 in init)
>> Parent: 2 of 3, witch2 (0 in init)
>> Parent: 1 of 3, witch3 (0 in init)
>> #mpirun -np 3 -H witch2,witch3,witch4 ./spawn_multiple
>> Parent: 0 of 3, witch2 (0 in init)
>> Parent: 1 of 3, witch3 (0 in init)
>> Parent: 2 of 3, witch4 (0 in init)
>> #mpirun -np 3 -H witch2,witch3,witch4,witch5 ./spawn_multiple
>> Parent: 0 of 3, witch2 (0 in init)
>> Parent: 1 of 3, witch3 (0 in init)
>> Parent: 2 of 3, witch4 (0 in init)
>> [witch1:04806] *** Process received signal ***
>> [witch1:04806] Signal: Segmentation fault (11)
>> [witch1:04806] Signal code: Address not mapped (1)
>> [witch1:04806] Failing at address: 0x38
>> [witch1:04806] [ 0] /lib64/libpthread.so.0 [0x2af5324e9c10]
>> [witch1:04806] [ 1]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_app_report_launch+0x27a)
>> [0x2af531de3dca]
>> [witch1:04806] [ 2] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0
>> [0x2af531f161bb]
>> [witch1:04806] [ 3] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
>> [0x40378f]
>> [witch1:04806] [ 4] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0
>> [0x2af531f161bb]
>> [witch1:04806] [ 5]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x9e)
>> [0x2af531f0bf5e]
>> [witch1:04806] [ 6]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_trigger_event+0x44)
>> [0x2af531dc6c84]
>> [witch1:04806] [ 7]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_app_report_launch+0x20b)
>> [0x2af531de3d5b]
>> [witch1:04806] [ 8] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0
>> [0x2af531f161bb]
>> [witch1:04806] [ 9]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0(opal_progress+0x9e)
>> [0x2af531f0bf5e]
>> [witch1:04806] [10]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x227)
>> [0x2af531de47e7]
>> [witch1:04806] [11]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/openmpi/mca_plm_rsh.so
>> [0x2af532c38d3d]
>> [witch1:04806] [12]
>> /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-rte.so.0(orte_plm_base_receive_process_msg+0x456)
>> [0x2af531de3086]
>> [witch1:04806] [13] /home/USERS/lenny/OMPI_ORTE_TRUNK/lib/libopen-pal.so.0
>> [0x2af531f161bb]
>> [witch1:04806] [14] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
>> [0x4033bc]
>> [witch1:04806] [15] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
>> [0x402c23]
>> [witch1:04806] [16] /lib64/libc.so.6(__libc_start_main+0xf4)
>> [0x2af532610154]
>> [witch1:04806] [17] /home/USERS/lenny/OMPI_ORTE_TRUNK/bin/mpirun
>> [0x402b79]
>> [witch1:04806] *** End of error message ***
>> Segmentation fault
>>
>> Lenny.
>>
>>
>