Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI-Ranking problem
From: Chamila Janath (chamila.janath_at_[hidden])
Date: 2010-06-09 07:55:44


Dear Sir/Madam,

     I'm running OpenMPI 1.4.2 version. The operation system is Ubuntu 9.10
with kernel version 2.6.31-14.

$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out

             * This works fine on single core P4 machine.*

$ mpirun -np 1 -bind-to-core a.out

             *This also works fine.*

$ mpirun -np 1 -cpus-per-proc 1 -bind-to-core a.out

         * This too works fine sir/madam.*

*But i specified rank file as,

          rank 0=127.0.0.1 slot=0*

Run the app as,

$ *mpirun -np 1 -rf rankfile a.out*

It gives,

[ucsc-laptop:03027] *** Process received signal ***
[ucsc-laptop:03027] Signal: Segmentation fault (11)
[ucsc-laptop:03027] Signal code: Address not mapped (1)
[ucsc-laptop:03027] Failing at address: 0x8
[ucsc-laptop:03027] [ 0] [0x867410]
[ucsc-laptop:03027] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03027] [ 2]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x44cb56]
[ucsc-laptop:03027] [ 3] a.out [0x8048751]
[ucsc-laptop:03027] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3027 on node ucsc-laptop exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

and for the following execution,

*$ mpirun -np 1 -rf rankfile --bind-to-core a.out*

[ucsc-laptop:03053] *** Process received signal ***
[ucsc-laptop:03053] Signal: Segmentation fault (11)
[ucsc-laptop:03053] Signal code: Address not mapped (1)
[ucsc-laptop:03053] Failing at address: 0x8
[ucsc-laptop:03053] [ 0] [0xab0410]
[ucsc-laptop:03053] [ 1] a.out(main+0x5f) [0x8048843]
[ucsc-laptop:03053] [ 2]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x234b56]
[ucsc-laptop:03053] [ 3] a.out [0x8048751]
[ucsc-laptop:03053] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 3053 on node ucsc-laptop exited
on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

*I need to execute my program in a manner that,*

$ *mpirun -np 5 -rf rankfile a.out*

Where rank file:

rank 0=10.16.71.14 slot=0 # 10.16.71.14 is Duel core machine
rank 1=10.16.71.14 slot=1
rank 2=10.16.71.15 slot=0 # 10.16.71.15 is Duel core machine
rank 3=10.16.71.15 slot=1
rank 4=10.16.71.16 slot=0 # 10.16.71.16 is P4 machine with single
core

This gives segmentation fault as *$mpirun -np 1 -rf rankfile a.out*

But if i commented out the line *rank 4=10.16.71.16 slot=0* and execute the
program as *$mpirun -np 4 -rf rankfile a.out* then it *executes fine.*

Please help me. How can I overcome this.

Yours faithfully,
Chamila Janath.

On Tue, Jun 8, 2010 at 10:11 PM, Terry Dontje <terry.dontje_at_[hidden]>wrote:

> Which version of OMPI are you running on and the OS version?
> Can you try and replace the rankfile specification with --bind-to-core and
> tell me if that works any better?
>
> --td
>
> Chamila Janath wrote:
>
>
> *rankfile*
> rank 0=10.16.71.1 slot=0
>
> I launched my mpi app using,
>
> $ mpirun -np 1 -rf rankfile appname
>
> I can run the application on Intel dual-core machine with Linux based OS
> nicely. But i can't run it on single core machine(P4).
> The execution terminates specifying a problem of slot number. What is the
> reason for this? A bug or problem of the slot number I specified.(I tried by
> using rank 0=10.16.71.1 slot=p0:0 but it too failed)
> Please help me.
>
> Thanks a lot....
>
> ------------------------------
>
> _______________________________________________
> users mailing listusers_at_[hidden]http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> --
> [image: Oracle]
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden]
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>




picture