This case was a very simple 6 process test on a single node which ran to completion.
I'm installing mpi 1.8.1 now to see if I see the same issue.
I just installed and ran hwloc. What am I looking for? I see basic information PCI (ib0, ib1, mix4_0) PCI(eth0) PCI(eth1) PCI() PCI(sda) and others...
When I launch the mpi process I'm using mpirun --mca btl self,sm,openib
I have not explicitly specified in mpirun to use processor affinity. When running top (1) I do see that the processes migrate from core to core from time to time. Am I using processor affinity and if so shouldn't the process(s) remain on each individual core throughout execution? Hyperthreading is off. I am not using a rank file nor specifying the mpirun command to explicitly use processor affinity.
CONFIDENTIALITY WARNING: This email may contain privileged or confidential information and is for the sole use of the intended recipients. Unauthorized disclosure or use of this communication is prohibited. If you believe that you have received this email in error, please notify the sender immediately and delete it from your system.
From: devel [mailto:devel-bounces_at_[hidden]] On Behalf Of Jeff Squyres (jsquyres)
Sent: Saturday, May 31, 2014 8:13 AM
To: Open MPI Developers
Subject: [External] Re: [OMPI devel] Open MPI 1.6.5 opal_paffinity_base_get_physical_socket_id
The super short answer is: 1.6.x is old and deprecated; can you upgrade to the 1.8.x series?
The short answer is "no" -- paffinity calls should never block, but it depends on how and what you're measuring.
The more detailed answer is: your trace below looks like it includes a call to MPI_Abort. Did your process hang during the abort, perchance, and (somehow) get stuck in a process affinity call?
Are you able to download and run the lstopo command from the hwloc suite? (http://www.open-mpi.org/software/hwloc/v1.9/)
On May 30, 2014, at 2:47 PM, "Lecrenski, Stephen K PW" <Stephen.Lecrenski_at_[hidden]> wrote:
> I am running some performance tests (Open SpeedShop) with a program which uses Open MPI and Infiniband.
> I see a line of code which appears to be taking a considerable amount of cpu time in relation to other pieces of the code.
> opal_paffinity_base_get_physical_socket_id (libmpi.so.1.0.8:
> Exclusive CPU time in seconds.
> % of CPU Time
> Statement Location (Line Number)
> Is this normal behavior?
> Stephen Lecrenski
> High Performance Technical Computing
> Pratt & Whitney
> 400 Main Street
> East Hartford,CT 06108
> Telephone: 860 - 557 - 2895
> Email: Stephen.Lecrenski_at_[hidden]
> P Please consider the environment before printing this e-mail
> CONFIDENTIALITY WARNING: This email may contain privileged or confidential information and is for the sole use of the intended recipients. Unauthorized disclosure or use of this communication is prohibited. If you believe that you have received this email in error, please notify the sender immediately and delete it from your system.
> devel mailing list
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
devel mailing list
Link to this post: http://www.open-mpi.org/community/lists/devel/2014/05/14916.php