Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Specifying slots in rankfile
From: Terry Dontje (terry.dontje_at_[hidden])
Date: 2010-06-10 16:10:17


Sorry, there was a miscommunications between Ethan and I. The "*"
nomenclature never worked in OMPI, it is the specification of "n:*" that
works and we believe still works.

--td
Terry Dontje wrote:
> It looks like the rankfile "*" syntax was broke between version r22761
> and r23214. So, it looks like a regression to me. Ethan is looking
> into trying to narrow this down more.
>
> --td
>
> Ralph Castain wrote:
>> I would have to look at the code, but I suspect it doesn't handle "*". Could be upgraded to do so, but that would depend on the relevant developer to do so :-)
>>
>>
>> On Jun 9, 2010, at 10:16 AM, Grzegorz Maj wrote:
>>
>>
>>> Thanks a lot, it works fine for me.
>>> But going back to my problems - is it some bug in open-mpi or I should
>>> use "slot=*" option in some other way?
>>>
>>> 2010/6/9 Ralph Castain <rhc_at_[hidden]>:
>>>
>>>> I would recommend using the sequential mapper instead:
>>>>
>>>> mpirun -mca rmaps seq
>>>>
>>>> You can then just list your hosts in your hostfile, and we will put the ranks sequentially on those hosts. So you get something like this
>>>>
>>>> host01 <= rank0
>>>> host01 <= rank1
>>>> host02 <= rank2
>>>> host03 <= rank3
>>>> host01 <= rank4
>>>>
>>>> Ralph
>>>>
>>>> On Jun 9, 2010, at 4:39 AM, Grzegorz Maj wrote:
>>>>
>>>>
>>>>> In my previous mail I said that slot=0-3 would be a solution.
>>>>> Unfortunately it gives me exactly the same segfault as in case with
>>>>> *:*
>>>>>
>>>>> 2010/6/9 Grzegorz Maj <maju3_at_[hidden]>:
>>>>>
>>>>>> Hi,
>>>>>> I'd like mpirun to run tasks with specific ranks on specific hosts,
>>>>>> but I don't want to provide any particular sockets/slots/cores.
>>>>>> The following example uses just one host, but generally I'll use more.
>>>>>> In my hostfile I just have:
>>>>>>
>>>>>> root_at_host01 slots=4
>>>>>>
>>>>>> I was playing with my rankfile to achieve what I've mentioned, but I
>>>>>> always get some problems.
>>>>>>
>>>>>> 1) With rankfile like:
>>>>>> rank 0=host01 slot=*
>>>>>> rank 1=host01 slot=*
>>>>>> rank 2=host01 slot=*
>>>>>> rank 3=host01 slot=*
>>>>>>
>>>>>> I get:
>>>>>>
>>>>>> --------------------------------------------------------------------------
>>>>>> We were unable to successfully process/set the requested processor
>>>>>> affinity settings:
>>>>>>
>>>>>> Specified slot list: *
>>>>>> Error: Error
>>>>>>
>>>>>> This could mean that a non-existent processor was specified, or
>>>>>> that the specification had improper syntax.
>>>>>> --------------------------------------------------------------------------
>>>>>> --------------------------------------------------------------------------
>>>>>> mpirun was unable to start the specified application as it encountered an error:
>>>>>>
>>>>>> Error name: Error
>>>>>> Node: host01
>>>>>>
>>>>>> when attempting to start process rank 0.
>>>>>> --------------------------------------------------------------------------
>>>>>> [host01:13715] Rank 0: PAFFINITY cannot get physical processor id for
>>>>>> logical processor 4
>>>>>>
>>>>>>
>>>>>> I think it tries to find processor #4, bug there are only 0-3
>>>>>>
>>>>>> 2) With rankfile like:
>>>>>> rank 0=host01 slot=*:*
>>>>>> rank 1=host01 slot=*:*
>>>>>> rank 2=host01 slot=*:*
>>>>>> rank 3=host01 slot=*:*
>>>>>>
>>>>>> Everything looks well, i.e. my programs are spread across 4 processors.
>>>>>> But when running MPI program as follows:
>>>>>>
>>>>>> MPI::Init(argc, argv);
>>>>>> fprintf(stderr, "after init %d\n", MPI::Is_initialized());
>>>>>> nprocs_mpi = MPI::COMM_WORLD.Get_size();
>>>>>> fprintf(stderr, "won't get here\n");
>>>>>>
>>>>>> I get:
>>>>>>
>>>>>> after init 1
>>>>>> [host01:14348] *** Process received signal ***
>>>>>> [host01:14348] Signal: Segmentation fault (11)
>>>>>> [host01:14348] Signal code: Address not mapped (1)
>>>>>> [host01:14348] Failing at address: 0x8
>>>>>> [host01:14348] [ 0] [0xffffe410]
>>>>>> [host01:14348] [ 1] p(_ZNK3MPI4Comm8Get_sizeEv+0x19) [0x8051299]
>>>>>> [host01:14348] [ 2] p(main+0x86) [0x804ee4e]
>>>>>> [host01:14348] [ 3] /lib/libc.so.6(__libc_start_main+0xe5) [0x4180b5c5]
>>>>>> [host01:14348] [ 4] p(__gxx_personality_v0+0x125) [0x804ecc1]
>>>>>> [host01:14348] *** End of error message ***
>>>>>>
>>>>>> I'm using OPEN MPI v. 1.4.2 (downloaded yesterday).
>>>>>> In my rankfile I really want to write something like slot=*. I know
>>>>>> slot=0-3 would be a solution, but when generating rankfile I may not
>>>>>> be sure how many processors are there available on a particular host.
>>>>>>
>>>>>> Any help would be appreciated.
>>>>>>
>>>>>> Regards,
>>>>>> Grzegorz Maj
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
> --
> Oracle
> Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.650.633.7054
> Oracle * - Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.650.633.7054
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.dontje_at_[hidden] <mailto:terry.dontje_at_[hidden]>



picture
picture