Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Hybrid OpenMPI / OpenMP run pins OpenMP threads to a single core
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-07-29 10:46:23


If you check, I expect you will find that your threads and processes are not bound to a core, but are now constrained to stay within a socket.

This means that if you run more threads than cores in a socket, you will see threads idled due to contention.

On Jul 29, 2010, at 8:29 AM, David Akin wrote:

> Adding -bysocket -bind-to-socket worked! Now to figure out why that
> is? I also assumed it was my code. You can try my simple example code
> below.
>
> On Thu, Jul 29, 2010 at 8:49 AM, Ralph Castain <rhc_at_[hidden]> wrote:
>>
>> On Jul 29, 2010, at 5:09 AM, Terry Dontje wrote:
>>
>> Ralph Castain wrote:
>>
>> How are you running it when the threads are all on one core?
>>
>> If you are specifying --bind-to-core, then of course all the threads will be
>> on one core since we bind the process (not the thread). If you are
>> specifying -mca mpi_paffinity_alone 1, then the same behavior results.
>>
>> Generally, if you want to bind threads, the only way to do it is with a rank
>> file. We -might- figure out a way to provide an interface for thread-level
>> binding, but I'm not sure about that right now. As things stand, OMPI has no
>> visibility into the fact that your app spawned threads.
>>
>>
>>
>>
>> Huh??? That's not completely correct. If you have a multiple socket
>> machine you could to -bind-to-socket -bysocket and spread the processes that
>> way. Also couldn't you use the -cpus-per-proc with -bind-to-core to get a
>> process to bind to a non-socket amount of cpus?
>>
>> Yes, you could do bind-to-socket, though that still constrains the threads
>> to only that one socket. What was asked about here was the ability to
>> bind-to-core at the thread level, and that is something OMPI doesn't
>> support.
>>
>>
>> This is all documented in the mpirun manpage.
>>
>> That being said, I also am confused, like Ralph, as to why no options is
>> causing your code bind. Maybe add a --report-bindings to your mpirun line
>> to see what OMPI thinks it is doing in this regard?
>>
>> This is a good suggestion - I'm beginning to believe that the binding is
>> happening in the user's app and not OMPI.
>>
>>
>> --td
>>
>> --td
>>
>> On Jul 28, 2010, at 5:47 PM, David Akin wrote:
>>
>>
>>
>> All,
>> I'm trying to get the OpenMP portion of the code below to run
>> multicore on a couple of 8 core nodes.
>>
>> Good news: multiple threads are being spawned on each node in the run.
>> Bad news: each of the threads only runs on a single core, leaving 7
>> cores basically idle.
>> Sorta good news: if I provide a rank file I get the threads running on
>> different cores within each node (PITA.
>>
>> Here's the first lines of output.
>>
>> /usr/mpi/gcc/openmpi-1.4-qlc/bin/mpirun -host c005,c006 -np 2 -rf
>> rank.file -x OMP_NUM_THREADS=4 hybrid4.gcc
>>
>> Hello from thread 2 out of 4 from process 1 out of 2 on c006.local
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=2
>> Hello from thread 3 out of 4 from process 1 out of 2 on c006.local
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
>> Hello from thread 1 out of 4 from process 1 out of 2 on c006.local
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
>> Hello from thread 1 out of 4 from process 0 out of 2 on c005.local
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=1
>> Hello from thread 3 out of 4 from process 0 out of 2 on c005.local
>> Hello from thread 2 out of 4 from process 0 out of 2 on c005.local
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
>> Hello from thread 0 out of 4 from process 0 out of 2 on c005.local
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
>> Hello from thread 0 out of 4 from process 1 out of 2 on c006.local
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=0
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=3
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=3
>> another parallel region: name:c005.local MPI_RANK_ID=0 OMP_THREAD_ID=2
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=0
>> another parallel region: name:c006.local MPI_RANK_ID=1 OMP_THREAD_ID=1
>> .
>> .
>> .
>>
>> Here's the simple code:
>> #include <stdio.h>
>> #include "mpi.h"
>> #include <omp.h>
>>
>> int main(int argc, char *argv[]) {
>> int numprocs, rank, namelen;
>> char processor_name[MPI_MAX_PROCESSOR_NAME];
>> int iam = 0, np = 1;
>> char name[MPI_MAX_PROCESSOR_NAME]; /* MPI_MAX_PROCESSOR_NAME ==
>> 128 */
>> int O_ID; /* OpenMP thread ID
>> */
>> int M_ID; /* MPI rank ID
>> */
>> int rtn_val;
>>
>> MPI_Init(&argc, &argv);
>> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Get_processor_name(processor_name, &namelen);
>>
>> #pragma omp parallel default(shared) private(iam, np,O_ID)
>> {
>> np = omp_get_num_threads();
>> iam = omp_get_thread_num();
>> printf("Hello from thread %d out of %d from process %d out of %d on
>> %s\n",
>> iam, np, rank, numprocs, processor_name);
>> int i=0;
>> int j=0;
>> double counter=0;
>> for(i =0;i<99999999;i++)
>> {
>> O_ID = omp_get_thread_num(); /* get OpenMP
>> thread ID */
>> MPI_Get_processor_name(name,&namelen);
>> rtn_val = MPI_Comm_rank(MPI_COMM_WORLD,&M_ID);
>> printf("another parallel region: name:%s
>> MPI_RANK_ID=%d OMP_THREAD_ID=%d\n", name,M_ID,O_ID);
>> for(j = 0;j<999999999;j++)
>> {
>> counter=counter+i;
>> }
>> }
>>
>> }
>>
>> MPI_Finalize();
>>
>> }
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> <Mail Attachment.gif>
>> Terry D. Dontje | Principal Software Engineer
>> Developer Tools Engineering | +1.650.633.7054
>> Oracle - Performance Technologies
>> 95 Network Drive, Burlington, MA 01803
>> Email terry.dontje_at_[hidden]
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users