Dear Sir/Madam,

     I'm running OpenMPI 1.4.2 version. The operation system is Ubuntu 9.10 with kernel version 2.6.31-14.

$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out

              This works fine on single core P4 machine.



$ mpirun -np 1 -bind-to-core a.out
              
             This also works fine.

$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out

            This too works fine sir/madam.

But i specified rank file as,

          rank 0=127.0.0.1 slot=0


Run the app as,

$ mpirun -np 1 -rf rankfile a.out

It gives,

[ucsc-laptop:03027] *** Process received signal ***
[ucsc-laptop:03027] Signal: Segmentation fault (11)
[ucsc-laptop:03027] Signal code: Address not mapped (1)
[ucsc-laptop:03027] Failing at address: 0x8
[ucsc-laptop:03027] [ 0] [0x867410]
[ucsc-laptop:03027] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03027] [ 2] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x44cb56]
[ucsc-laptop:03027] [ 3] a.out [0x8048751]
[ucsc-laptop:03027] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3027 on node ucsc-laptop exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

and for the following execution,

$ mpirun -np 1 -rf rankfile --bind-to-core a.out

[ucsc-laptop:03053] *** Process received signal ***
[ucsc-laptop:03053] Signal: Segmentation fault (11)
[ucsc-laptop:03053] Signal code: Address not mapped (1)
[ucsc-laptop:03053] Failing at address: 0x8
[ucsc-laptop:03053] [ 0] [0xab0410]
[ucsc-laptop:03053] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03053] [ 2] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x234b56]
[ucsc-laptop:03053] [ 3] a.out [0x8048751]
[ucsc-laptop:03053] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3053 on node ucsc-laptop exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


I need to execute my program in a manner that,

$ mpirun -np 5 -rf rankfile a.out

Where rank file:

rank 0=10.16.71.14 slot=0            # 10.16.71.14 is Duel core machine
rank 1=10.16.71.14 slot=1
rank 2=10.16.71.15 slot=0            # 10.16.71.15 is Duel core machine
rank 3=10.16.71.15 slot=1
rank 4=10.16.71.16 slot=0            # 10.16.71.16 is P4 machine with single core


This gives segmentation fault as $mpirun -np 1 -rf rankfile a.out

But if i commented out the line rank 4=10.16.71.16 slot=0  and execute the program as $mpirun -np 4 -rf rankfile a.out then it executes fine.


Please help me. How can I overcome this.

Yours faithfully,
Chamila Janath.


On Tue, Jun 8, 2010 at 10:11 PM, Terry Dontje <terry.dontje@oracle.com> wrote:
Which version of OMPI are you running on and the OS version?
Can you try and replace the rankfile specification with --bind-to-core and tell me if that works any better?

--td

Chamila Janath wrote:

rankfile
rank 0=10.16.71.1 slot=0

I launched my mpi app using,

$ mpirun -np 1 -rf rankfile appname

I can run the application on Intel dual-core machine with Linux based OS nicely. But i can't run it on single core machine(P4).
The execution terminates specifying a problem of slot number. What is the reason for this? A bug or problem of the slot number I specified.(I tried by using rank 0=10.16.71.1 slot=p0:0 but it too failed)
Please help me.

Thanks a lot....


_______________________________________________ users mailing list users@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.dontje@oracle.com


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users