Hi,
When I run a parallel program, I got an error : ------------------------------------------------------------------[n333:129522] *** Process received signal ***[n333:129522] Signal: Segmentation fault (11)[n333:129522] Signal code: Address not mapped (1)[n333:129522] Failing at address: 0x40[n333:129522] [ 0] /lib64/libpthread.so.0 [0x3c50e0e4c0][n333:129522] [ 1] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4cd19b1][n333:129522] [ 2] /opt/openmpi-1.3.4-gnu/lib/libopen-pal.so.0(opal_progress+0x75) [0x52e5165][n333:129522] [ 3] /opt/openmpi-1.3.4-gnu/lib/libopen-rte.so.0 [0x508565c][n333:129522] [ 4] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0 [0x4c653eb][n333:129522] [ 5] /opt/openmpi-1.3.4-gnu/lib/libmpi.so.0(MPI_Init+0x120) [0x4c84b90][n333:129522] [ 6] /lustre/jxding/netplan49/nsga2b [0x4497f6][n333:129522] [ 7] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c5021d974][n333:129522] [ 8] /lustre/jxding/netplan49/nsga2b(__gxx_personality_v0+0x499) [0x4436e9][n333:129522] *** End of error message ***--------------------------------------------------------------------------mpirun has exited due to process rank 24 with PID 129522 onnode n333 exiting without calling "finalize". This mayhave caused other processes in the application to beterminated by signals sent by mpirun (as reported here).-----------------------------------------------------------------------------------------------------------------------------------------------------------------But, the program only run for not more than a few of minutes. It should take hours to finish.
How can it reach "finalize" so fast ?
Any help is appreciated.
Jack
|