Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] problem with rankfile in openmpi-1.6.4rc2
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-01-25 00:26:38


Found it! A trivial error (missing a break in a switch statement) that only impacts things if multiple sockets are specified in the slot_list. CMR filed to include the fix in 1.6.4

Thanks for your patience
Ralph

On Jan 24, 2013, at 7:50 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> I built the current 1.6 branch (which hasn't seen any changes that would impact this function) and was able to execute it just fine on a single socket machine. I then gave it your slot-list, which of course failed as I don't have two active sockets (one is empty), but it appeared to parse the list just fine.
>
> From what I can tell, it looks like your linpc1 is unable to detect a second socket for some reason when given the slot_list argument. I'll have to try again tomorrow when I have access to a dual-socket machine.
>
> On Jan 19, 2013, at 1:45 AM, Siegmar Gross <Siegmar.Gross_at_[hidden]> wrote:
>
>> Hi
>>
>> I have installed openmpi-1.6.4rc2 and have still a problem with my
>> rankfile.
>>
>> linpc1 rankfiles 113 ompi_info | grep "Open MPI:"
>> Open MPI: 1.6.4rc2r27861
>>
>> linpc1 rankfiles 114 cat rf_linpc1
>> rank 0=linpc1 slot=0:0-1,1:0-1
>>
>> linpc1 rankfiles 115 mpiexec -report-bindings -np 1 \
>> -rf rf_linpc1 hostname
>> --------------------------------------------------------------------
>> We were unable to successfully process/set the requested processor
>> affinity settings:
>>
>> Specified slot list: 0:0-1,1:0-1
>> Error: Error
>>
>> This could mean that a non-existent processor was specified, or
>> that the specification had improper syntax.
>> --------------------------------------------------------------------
>> --------------------------------------------------------------------
>> mpiexec was unable to start the specified application as it
>> encountered an error:
>>
>> Error name: Error
>> Node: linpc1
>>
>> when attempting to start process rank 0.
>> --------------------------------------------------------------------
>>
>>
>> Everything works fine with the following command.
>>
>> linpc1 rankfiles 116 mpiexec -report-bindings -np 1 -cpus-per-proc 4 \
>> -bycore -bind-to-core hostname
>> [linpc1:20140] MCW rank 0 bound to socket 0[core 0-1]
>> socket 1[core 0-1]: [B B][B B]
>> linpc1
>>
>>
>> I would be grateful if somebody could fix the problem. Thank you very
>> much for any help in advance.
>>
>>
>> Kind regards
>>
>> Siegmar
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>