Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Specifying slots in rankfile
From: Grzegorz Maj (maju3_at_[hidden])
Date: 2010-06-09 12:16:36


Thanks a lot, it works fine for me.
But going back to my problems - is it some bug in open-mpi or I should
use "slot=*" option in some other way?

2010/6/9 Ralph Castain <rhc_at_[hidden]>:
> I would recommend using the sequential mapper instead:
>
> mpirun -mca rmaps seq
>
> You can then just list your hosts in your hostfile, and we will put the ranks sequentially on those hosts. So you get something like this
>
> host01  <= rank0
> host01  <= rank1
> host02  <= rank2
> host03  <= rank3
> host01  <= rank4
>
> Ralph
>
> On Jun 9, 2010, at 4:39 AM, Grzegorz Maj wrote:
>
>> In my previous mail I said that slot=0-3 would be a solution.
>> Unfortunately it gives me exactly the same segfault as in case with
>> *:*
>>
>> 2010/6/9 Grzegorz Maj <maju3_at_[hidden]>:
>>> Hi,
>>> I'd like mpirun to run tasks with specific ranks on specific hosts,
>>> but I don't want to provide any particular sockets/slots/cores.
>>> The following example uses just one host, but generally I'll use more.
>>> In my hostfile I just have:
>>>
>>> root_at_host01 slots=4
>>>
>>> I was playing with my rankfile to achieve what I've mentioned, but I
>>> always get some problems.
>>>
>>> 1) With rankfile like:
>>> rank 0=host01 slot=*
>>> rank 1=host01 slot=*
>>> rank 2=host01 slot=*
>>> rank 3=host01 slot=*
>>>
>>> I get:
>>>
>>> --------------------------------------------------------------------------
>>> We were unable to successfully process/set the requested processor
>>> affinity settings:
>>>
>>> Specified slot list: *
>>> Error: Error
>>>
>>> This could mean that a non-existent processor was specified, or
>>> that the specification had improper syntax.
>>> --------------------------------------------------------------------------
>>> --------------------------------------------------------------------------
>>> mpirun was unable to start the specified application as it encountered an error:
>>>
>>> Error name: Error
>>> Node: host01
>>>
>>> when attempting to start process rank 0.
>>> --------------------------------------------------------------------------
>>> [host01:13715] Rank 0: PAFFINITY cannot get physical processor id for
>>> logical processor 4
>>>
>>>
>>> I think it tries to find processor #4, bug there are only 0-3
>>>
>>> 2) With rankfile like:
>>> rank 0=host01 slot=*:*
>>> rank 1=host01 slot=*:*
>>> rank 2=host01 slot=*:*
>>> rank 3=host01 slot=*:*
>>>
>>> Everything looks well, i.e. my programs are spread across 4 processors.
>>> But when running MPI program as follows:
>>>
>>> MPI::Init(argc, argv);
>>> fprintf(stderr, "after init %d\n", MPI::Is_initialized());
>>> nprocs_mpi = MPI::COMM_WORLD.Get_size();
>>> fprintf(stderr, "won't get here\n");
>>>
>>> I get:
>>>
>>> after init 1
>>> [host01:14348] *** Process received signal ***
>>> [host01:14348] Signal: Segmentation fault (11)
>>> [host01:14348] Signal code: Address not mapped (1)
>>> [host01:14348] Failing at address: 0x8
>>> [host01:14348] [ 0] [0xffffe410]
>>> [host01:14348] [ 1] p(_ZNK3MPI4Comm8Get_sizeEv+0x19) [0x8051299]
>>> [host01:14348] [ 2] p(main+0x86) [0x804ee4e]
>>> [host01:14348] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0x4180b5c5]
>>> [host01:14348] [ 4] p(__gxx_personality_v0+0x125) [0x804ecc1]
>>> [host01:14348] *** End of error message ***
>>>
>>> I'm using OPEN MPI v. 1.4.2 (downloaded yesterday).
>>> In my rankfile I really want to write something like slot=*. I know
>>> slot=0-3 would be a solution, but when generating rankfile I may not
>>> be sure how many processors are there available on a particular host.
>>>
>>> Any help would be appreciated.
>>>
>>> Regards,
>>> Grzegorz Maj
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>