Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] [External] Re: Open MPI 1.6.5 opal_paffinity_base_get_physical_socket_id
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-05-31 10:45:12


On May 31, 2014, at 10:32 AM, "Lecrenski, Stephen K PW" <Stephen.Lecrenski_at_[hidden]> wrote:

> This case was a very simple 6 process test on a single node which ran to completion.
>
> I'm installing mpi 1.8.1 now to see if I see the same issue.
>
> I just installed and ran hwloc. What am I looking for? I see basic information PCI (ib0, ib1, mix4_0) PCI(eth0) PCI(eth1) PCI() PCI(sda) and others...

The fact that it ran without hanging for a huge period of time is a good sign; that's really all I was looking for.

> When I launch the mpi process I'm using mpirun --mca btl self,sm,openib

That should be fine.

> I have not explicitly specified in mpirun to use processor affinity. When running top (1) I do see that the processes migrate from core to core from time to time.

With 1.6.x, that sounds good. That does make it weirder, though -- you weren't using affinity, but you were spending giant amounts of time in the affinity code. Strange.

With 1.8.x, OMPI enables affinity by default.

Let's see what happens with 1.8.x -- if upgrading solves your problem, that would be best.

> Am I using processor affinity and if so shouldn't the process(s) remain on each individual core throughout execution? Hyperthreading is off. I am not using a rank file nor specifying the mpirun command to explicitly use processor affinity.
>
> skl
> 860-557-2895
>
> CONFIDENTIALITY WARNING: This email may contain privileged or confidential information and is for the sole use of the intended recipients. Unauthorized disclosure or use of this communication is prohibited. If you believe that you have received this email in error, please notify the sender immediately and delete it from your system.
>
> -----Original Message-----
> From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff Squyres (jsquyres)
> Sent: Saturday, May 31, 2014 8:13 AM
> To: Open MPI Developers
> Subject: [External] Re: [OMPI devel] Open MPI 1.6.5 opal_paffinity_base_get_physical_socket_id
>
> The super short answer is: 1.6.x is old and deprecated; can you upgrade to the 1.8.x series?
>
> The short answer is "no" -- paffinity calls should never block, but it depends on how and what you're measuring.
>
> The more detailed answer is: your trace below looks like it includes a call to MPI_Abort. Did your process hang during the abort, perchance, and (somehow) get stuck in a process affinity call?
>
> Are you able to download and run the lstopo command from the hwloc suite? (http://www.open-mpi.org/software/hwloc/v1.9/)
>
>
>
>
> On May 30, 2014, at 2:47 PM, "Lecrenski, Stephen K PW" <Stephen.Lecrenski_at_[hidden]> wrote:
>
>> I am running some performance tests (Open SpeedShop) with a program which uses Open MPI and Infiniband.
>>
>> I see a line of code which appears to be taking a considerable amount of cpu time in relation to other pieces of the code.
>>
>> opal_paffinity_base_get_physical_socket_id (libmpi.so.1.0.8:
>> paffinity_base_wrappers.c,118)
>>
>> Exclusive CPU time in seconds.
>> % of CPU Time
>> Statement Location (Line Number)
>> 19031.94
>> 38.339796
>> paffinity_base_wrappers.c(118)
>> 14188.66
>> 28.583021
>> paffinity_base_wrappers.c(113)
>> 10934.38
>> 22.027282
>> paffinity_base_wrappers.c(129)
>> 2185.16
>> 4.401999
>> paffinity_base_wrappers.c(121)
>> 1081.96
>> 2.179606
>> paffinity_base_wrappers.c(130)
>> 546.93
>> 1.101789
>> paffinity_base_wrappers.c(114)
>> 546.17
>> 1.100258
>> paffinity_base_wrappers.c(65)
>> 541.67
>> 1.091193
>> paffinity_base_wrappers.c(126)
>> 540.52
>> 1.088876
>> ompi_mpi_abort.c(80)
>> 2.23
>> 0.004492
>> ompi_mpi_abort.c(101)
>>
>>
>> Is this normal behavior?
>>
>> Thanks,
>>
>> Stephen Lecrenski
>> High Performance Technical Computing
>>
>> Pratt & Whitney
>> 400 Main Street
>> East Hartford,CT 06108
>> Telephone: 860 - 557 - 2895
>> Email: Stephen.Lecrenski_at_[hidden]
>> P Please consider the environment before printing this e-mail
>> CONFIDENTIALITY WARNING: This email may contain privileged or confidential information and is for the sole use of the intended recipients. Unauthorized disclosure or use of this communication is prohibited. If you believe that you have received this email in error, please notify the sender immediately and delete it from your system.
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> Link to this post:
>> http://www.open-mpi.org/community/lists/devel/2014/05/14915.php
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14916.php
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14917.php

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/