Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Performance issue of mpirun/mpi_init
From: Ralph Castain (rhc_at_[hidden])
Date: 2014-04-10 10:07:49


Going on the assumption that it was a copy/paste error, the next question is: how many nodes were in your allocation?

I ask because there is a change in the way we launch between 1.8 and 1.6.5. Starting in the 1.7 series, mpirun launches daemons across your allocation at startup so it can collect information on the topology of the nodes in the allocation - this info is then used when mapping the job. In the 1.6 series, we only launched daemons on the nodes actually being used.

So 1.6.5 would indeed be faster IF you have a large allocation, but only launch a small number of procs. What you can do to compensate is add the --novm option to mpirun (or use the "state_novm_select=1" MCA param) which reverts back to the 1.6.5 behavior.

On Apr 10, 2014, at 7:00 AM, Ralph Castain <rhc_at_[hidden]> wrote:

> Just to ensure I understand what you are saying: it appears that 1.8 is much faster than 1.6.5 with the default settings, but slower when you set btl=tcp,self?
>
> This seems rather strange. I note that the 1.8 value is identical in the two cases, but somehow 1.6.5 went much faster in the latter case. Is this a copy/paste error?
>
>
> On Apr 10, 2014, at 2:05 AM, Victor Vysotskiy <Victor.Vysotskiy_at_[hidden]> wrote:
>
>> Dear Developers,
>>
>> I have faced a performance degradation on multi-core single processor machine. Specifically, in the most recent Open MPI v1.8 the initialization and process startup stage became ~10x slower compared to v1.6.5. In order to measure timings I have used the following code snippet:
>>
>> /*-------------------------------------------*/
>> #include <mpi.h>
>>
>> int main (int argc, char *argv[]) {
>>
>> MPI_Init(&argc,&argv);
>> MPI_Finalize();
>>
>> return 0;
>> }
>> /*-------------------------------------------*/
>>
>> The execution wall time has been measured in a trivial way by using the 'time' command, i.e.:
>>
>> time mpirun -np 2 ./a.out
>>
>> Below are given averaged timings for both versions on Linux x86_64 (Intel i7-3630):
>>
>> Default settings:
>> 1.8 : 0.679 s
>> 1.6.5: 1.041 s
>>
>> OMPI_MCA_btl=tcp,self:
>> 1.8 : 0.679 s
>> 1.6.5: 0.041 s
>>
>> The same problem has been detected on Mac OS X v10.9.2.
>>
>> Here I should stress that others MPI distributions perform as the OpenMPI v1.6.5 with the TCP byte transfer layer activated.
>>
>> So, I am wondering whether it is possible to tune v1.8 in order to boost the startup process? The problem is that during the automatic nightly verification of our program we usually spawn parallel binaries a thousands of times.
>>
>> Thank you In advance!
>>
>> Best regards,
>> Victor.
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>