Running with enabled shared memory gave me the following error.

mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)...
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened.  This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded).  Note that
Open MPI stopped checking at the first component that it did not find.

Host:      n568
Framework: btl
Component: tcp
----------------

may be it is not installed on our supercomputing center. What do you suggest ?

best regards,

----- Forwarded Message -----
From: Mudassar Majeed <mudassarm30@yahoo.com>
To: Jeff Squyres <jsquyres@cisco.com>
Sent: Friday, June 1, 2012 5:03 PM
Subject: Re: [OMPI users] Intra-node communication

Here is the code, I am taking care of the first message. I start measuring the round trip time from second message. If you see in the code I do 100 hand shakes and find the overall time for them. I have two nodes each having 8 cores ...... first I do exchange of messages between process 1 to process 2 because they are on the same node and measure the time. Then I do messages exchange between process 1 and 12 as they are on different nodes. But the output I got is as follows,

---------------------------------------------------------------------------------
mpprun INFO: Starting openmpi run on 2 nodes (16 ranks)...

with-in node: time = 150.663382 secs
across nodes: time = 134.627887 secs
---------------------------------------------------------------------------------

the code is as follows,

double *buff = NULL;
    double ex_time = 0.0f;
   
    buff = new double[1000000];
   
    for(i=0;i<1000000;i++)
    *(buff+i) = 100.5352f;
   
    MPI_Barrier(MPI_COMM_WORLD);
   
    int comm_amount = 100;//*(comm + my_rank * N + i);
   
    if(comm_amount > 0)
    {
        if(my_rank == 1)
        {
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
           
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, MPI_COMM_WORLD);
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 2, 4600, MPI_COMM_WORLD, &status);
           
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
        else if(my_rank == 2)
        {       
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
           
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, &status);
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD);
           
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
       
        if(my_rank == 1)
        printf("\nwith-in node: time = %f\n", ex_time);
       
        ex_time = 0.0f;
       
        if(my_rank == 1)
        {
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
           
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, MPI_COMM_WORLD);
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 12, 4600, MPI_COMM_WORLD, &status);
           
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
        else if(my_rank == 12)
        {       
        for(int j=0;j<comm_amount;j++)
        {
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &stime);
            }
           
            MPI_Recv((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD, &status);
            MPI_Ssend((void*)buff, 1000000, MPI_DOUBLE, 1, 4600, MPI_COMM_WORLD);
           
            if(j>0)
            {
            clock_gettime(CLOCK_REALTIME, &etime);
            ex_time = ex_time + (etime.tv_sec  - stime.tv_sec) + 1e-9*(etime.tv_nsec  - stime.tv_nsec);
            }
        }
        }
       
        if(my_rank == 1)
        printf("\nacross nodes: time = %f\n", ex_time);
    }



This time I have added
-mca btl self,sm,tcp
may be it will enable the shared memory support. But i had to do with mprun (not mpirun) as I have to submit job and can't use mpirun directly on supercomputer.
thanks for your help,

best




From: Jeff Squyres <jsquyres@cisco.com>
To: Open MPI Users <users@open-mpi.org>
Cc: Mudassar Majeed <mudassarm30@yahoo.com>
Sent: Friday, June 1, 2012 4:52 PM
Subject: Re: [OMPI users] Intra-node communication

...and exactly how you measured.  You might want to run a well-known benchmark, like NetPIPE or the OSU pt2pt benchmarks.

Note that the *first* send between any given peer pair is likely to be slow because OMPI does a lazy connection scheme (i.e., the connection is made behind the scenes).  Subsequent sends are likely faster.  Well-known benchmarks do a bunch of warmup sends and then start timing after those are all done.

Also ensure that you have shared memory support enabled.  It is likely to be enabled by default, but if you're seeing different performance than you expect, that's something to check.


On Jun 1, 2012, at 10:44 AM, Jingcha Joba wrote:

> This should not happen. Typically, Intra node communication latency are way way cheaper than inter node.
> Can you please tell us how u ran your application ?
> Thanks
>
> --
> Sent from my iPhone
>
> On Jun 1, 2012, at 7:34 AM, Mudassar Majeed <mudassarm30@yahoo.com> wrote:
>
>> Dear MPI people,
>>                                Can someone tell me why MPI_Ssend takes more time when two MPI processes are on same node ...... ?? the same two processes on different nodes take much less time for the same message exchange. I am using a supercomputing center and this happens. I was writing an algorithm to reduce the across node communication. But now I found that across node communication is cheaper than communication within a node (with 8 cores on each node).
>>
>> best regards,
>>
>> Mudassar
>> _______________________________________________
>> users mailing list
>> users@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquyres@cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/