Dear Sir/Madam,
I'm running OpenMPI 1.4.2 version. The operation system is Ubuntu 9.10 with kernel version 2.6.31-14.
$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out
This works fine on single core P4 machine.
$ mpirun -np 1 -bind-to-core a.out
This also works fine.
$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out
This too works fine sir/madam.
But i specified rank file as,
rank 0=127.0.0.1 slot=0
Run the app as,
$ mpirun -np 1 -rf rankfile a.out
It gives,
[ucsc-laptop:03027] *** Process received signal ***
[ucsc-laptop:03027] Signal: Segmentation fault (11)
[ucsc-laptop:03027] Signal code: Address not mapped (1)
[ucsc-laptop:03027] Failing at address: 0x8
[ucsc-laptop:03027] [ 0] [0x867410]
[ucsc-laptop:03027] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03027] [ 2] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x44cb56]
[ucsc-laptop:03027] [ 3] a.out [0x8048751]
[ucsc-laptop:03027] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3027 on node ucsc-laptop exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
and for the following execution,
$ mpirun -np 1 -rf rankfile --bind-to-core a.out
[ucsc-laptop:03053] *** Process received signal ***
[ucsc-laptop:03053] Signal: Segmentation fault (11)
[ucsc-laptop:03053] Signal code: Address not mapped (1)
[ucsc-laptop:03053] Failing at address: 0x8
[ucsc-laptop:03053] [ 0] [0xab0410]
[ucsc-laptop:03053] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03053] [ 2] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x234b56]
[ucsc-laptop:03053] [ 3] a.out [0x8048751]
[ucsc-laptop:03053] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3053 on node ucsc-laptop exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I need to execute my program in a manner that,
$ mpirun -np 5 -rf rankfile a.out
Where rank file:
rank 0=10.16.71.14 slot=0 # 10.16.71.14 is Duel core machine
rank 1=10.16.71.14 slot=1
rank 2=10.16.71.15 slot=0 # 10.16.71.15 is Duel core machine
rank 3=10.16.71.15 slot=1
rank 4=10.16.71.16 slot=0 # 10.16.71.16 is P4 machine with single core
This gives segmentation fault as $mpirun -np 1 -rf rankfile a.out
But if i commented out the line rank 4=10.16.71.16 slot=0 and execute the program as $mpirun -np 4 -rf rankfile a.out then it executes fine.
Please help me. How can I overcome this.
Yours faithfully,
Chamila Janath.
Which version of OMPI are you running on and the OS version?
Can you try and replace the rankfile specification with --bind-to-core and tell me if that works any better?
--td
Chamila Janath wrote:
rankfile
rank 0=10.16.71.1 slot=0
I launched my mpi app using,
$ mpirun -np 1 -rf rankfile appname
I can run the application on Intel dual-core machine with Linux based OS nicely. But i can't run it on single core machine(P4).
The execution terminates specifying a problem of slot number. What is the reason for this? A bug or problem of the slot number I specified.(I tried by using rank 0=10.16.71.1 slot=p0:0 but it too failed)
Please help me.
Thanks a lot....
_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users