Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Determining what parameters a scheduler passes to OpenMPI
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-06-06 15:20:14


You might want to update to 1.6.5, if you can - I'll see what I can find

On Jun 6, 2014, at 12:07 PM, Sasso, John (GE Power & Water, Non-GE) <John1.Sasso_at_[hidden]> wrote:

> Version 1.6 (i.e. prior to 1.6.1)
>
> -----Original Message-----
> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph Castain
> Sent: Friday, June 06, 2014 3:03 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] Determining what parameters a scheduler passes to OpenMPI
>
> It's possible that you are hitting a bug - not sure how much the cpus-per-proc option has been exercised in 1.6. Is this 1.6.5, or some other member of that series?
>
> I don't have a Torque machine handy any more, but should be able to test this scenario on my boxes
>
>
> On Jun 6, 2014, at 10:51 AM, Sasso, John (GE Power & Water, Non-GE) <John1.Sasso_at_[hidden]> wrote:
>
>> Re: $PBS_NODEFILE, we use that to create the hostfile that is passed via --hostfile (i.e. the two are the same).
>>
>> To further debug this, I passed "--display-allocation --display-map" to orterun, which resulted in:
>>
>> ====================== ALLOCATED NODES ======================
>>
>> Data for node: node0001 Num slots: 16 Max slots: 0
>> Data for node: node0002 Num slots: 8 Max slots: 0
>>
>> =================================================================
>>
>> ======================== JOB MAP ========================
>>
>> Data for node: node0001 Num procs: 24
>> Process OMPI jobid: [24552,1] Process rank: 0
>> Process OMPI jobid: [24552,1] Process rank: 1
>> Process OMPI jobid: [24552,1] Process rank: 2
>> Process OMPI jobid: [24552,1] Process rank: 3
>> Process OMPI jobid: [24552,1] Process rank: 4
>> Process OMPI jobid: [24552,1] Process rank: 5
>> Process OMPI jobid: [24552,1] Process rank: 6
>> Process OMPI jobid: [24552,1] Process rank: 7
>> Process OMPI jobid: [24552,1] Process rank: 8
>> Process OMPI jobid: [24552,1] Process rank: 9
>> Process OMPI jobid: [24552,1] Process rank: 10
>> Process OMPI jobid: [24552,1] Process rank: 11
>> Process OMPI jobid: [24552,1] Process rank: 12
>> Process OMPI jobid: [24552,1] Process rank: 13
>> Process OMPI jobid: [24552,1] Process rank: 14
>> Process OMPI jobid: [24552,1] Process rank: 15
>> Process OMPI jobid: [24552,1] Process rank: 16
>> Process OMPI jobid: [24552,1] Process rank: 17
>> Process OMPI jobid: [24552,1] Process rank: 18
>> Process OMPI jobid: [24552,1] Process rank: 19
>> Process OMPI jobid: [24552,1] Process rank: 20
>> Process OMPI jobid: [24552,1] Process rank: 21
>> Process OMPI jobid: [24552,1] Process rank: 22
>> Process OMPI jobid: [24552,1] Process rank: 23
>>
>> I have been going through the man page of mpirun as well as the OpenMPI mailing list and website, and thus far have been unable to determine the reason for the oversubscription of the head node (node0001) when even the PBS scheduler is passing along the correct slot count #s (16 and 8, resp).
>>
>> Am I running into a bug w/ OpenMPI 1.6?
>>
>> --john
>>
>>
>>
>> -----Original Message-----
>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph
>> Castain
>> Sent: Friday, June 06, 2014 1:30 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] Determining what parameters a scheduler
>> passes to OpenMPI
>>
>>
>> On Jun 6, 2014, at 10:24 AM, Gus Correa <gus_at_[hidden]> wrote:
>>
>>> On 06/06/2014 01:05 PM, Ralph Castain wrote:
>>>> You can always add --display-allocation to the cmd line to see what
>>>> we thought we received.
>>>>
>>>> If you configure OMPI with --enable-debug, you can set --mca
>>>> ras_base_verbose 10 to see the details
>>>>
>>>>
>>>
>>> Hi John
>>>
>>> On the Torque side, you can put a line "cat $PBS_NODEFILE" on the job script. This will list the nodes (multiple times according to the number of cores requested).
>>> I find this useful documentation,
>>> along with job number, work directory, etc.
>>> "man qsub" will show you all the PBS_* environment variables
>>> available to the job.
>>> For instance, you can echo them using a Torque 'prolog' script, if
>>> the user didn't do it. That will appear in the Torque STDOUT file.
>>>
>>> From outside the job script, "qstat -n" (and variants, say, with -u
>>> username) will list the nodes allocated to each job, again multiple
>>> times as per the requested cores.
>>>
>>> "tracejob job_number" will show similar information.
>>>
>>>
>>> If you configured Torque --with-cpuset, there is more information
>>> about the cpuset allocated to the job in /dev/cpuset/torque/jobnumber
>>> (on the first node listed above, called "mother superior" in Torque parlance).
>>> This mostly matter if there is more than one job running on a node.
>>> However, Torque doesn't bind processes/MPI_ranks to cores or sockets or whatever. As Ralph said, Open MPI does that.
>>> I believe Open MPI doesn't use the cpuset info from Torque.
>>> (Ralph, please correct me if I am wrong.)
>>
>> You are correct in that we don't use any per-process designations. We do, however, work inside any overall envelope that Torque may impose on us - e.g., if you tell Torque to limit the job to cores 0-4, we will honor that directive and keep all processes within that envelope.
>>
>>
>>>
>>> My two cents,
>>> Gus Correa
>>>
>>>
>>>> On Jun 6, 2014, at 10:01 AM, Reuti <reuti_at_[hidden]
>>>> <mailto:reuti_at_[hidden]>> wrote:
>>>>
>>>>> Am 06.06.2014 um 18:58 schrieb Sasso, John (GE Power & Water, Non-GE):
>>>>>
>>>>>> OK, so at the least, how can I get the node and slots/node info
>>>>>> that is passed from PBS?
>>>>>>
>>>>>> I ask because I'm trying to troubleshoot a problem w/ PBS and the
>>>>>> build of OpenMPI 1.6 I noted. If I submit a 24-process simple job
>>>>>> through PBS using a script which has:
>>>>>>
>>>>>> /usr/local/openmpi/bin/orterun -n 24 --hostfile
>>>>>> /home/sasso/TEST/hosts.file --mca orte_rsh_agent rsh --mca btl
>>>>>> openib,tcp,self --mca orte_base_help_aggregate 0 -x PATH -x
>>>>>> LD_LIBRARY_PATH /home/sasso/TEST/simplempihello.exe
>>>>>
>>>>> Using the --hostfile on your own would mean to violate the granted
>>>>> slot allocation by PBS. Just leave this option out. How do you
>>>>> submit your job?
>>>>>
>>>>> -- Reuti
>>>>>
>>>>>
>>>>>> And the hostfile /home/sasso/TEST/hosts.file contains 24 entries
>>>>>> (the first 16 being host node0001 and the last 8 being node0002),
>>>>>> it appears that 24 MPI tasks try to start on node0001 instead of getting
>>>>>> distributed as 16 on node0001 and 8 on node0002. Hence, I am
>>>>>> curious what is being passed by PBS.
>>>>>>
>>>>>> --john
>>>>>>
>>>>>>
>>>>>> From: users [mailto:users-bounces_at_[hidden]] On Behalf Of Ralph
>>>>>> Castain
>>>>>> Sent: Friday, June 06, 2014 12:31 PM
>>>>>> To: Open MPI Users
>>>>>> Subject: Re: [OMPI users] Determining what parameters a scheduler
>>>>>> passes to OpenMPI
>>>>>>
>>>>>> We currently only get the node and slots/node info from PBS - we
>>>>>> don't get any task placement info at all. We then use the mpirun
>>>>>> cmd options and built-in mappers to map the tasks to the nodes.
>>>>>>
>>>>>> I suppose we could do more integration in that regard, but haven't
>>>>>> really seen a reason to do so - the OMPI mappers are generally
>>>>>> more flexible than anything in the schedulers.
>>>>>>
>>>>>>
>>>>>> On Jun 6, 2014, at 9:08 AM, Sasso, John (GE Power & Water, Non-GE)
>>>>>> <John1.Sasso_at_[hidden] <mailto:John1.Sasso_at_[hidden]>> wrote:
>>>>>>
>>>>>>
>>>>>> For the PBS scheduler and using a build of OpenMPI 1.6 built
>>>>>> against PBS include files + libs, is there a way to determine
>>>>>> (perhaps via some debugging flags passed to mpirun) what job
>>>>>> placement parameters are passed from the PBS scheduler to OpenMPI?
>>>>>> In particular, I am talking about task placement info such as nodes to place on, etc.
>>>>>> Thanks!
>>>>>>
>>>>>> --john
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden] <mailto:users_at_[hidden]>
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users