I have finally solved the issue, or as it should be said, discovered my oversight. And it's a mistake that will have je mad at myself for a while. I'm new to MPI, though, and not versed in the MPP communications of LS-DYNA at all though, so it was an oversight easily made.
The key to fixing the entire situation was the test input file I was using. LS-DYNA accepts input files that contain all of the data that tell LS-DYNA what to do with the simulation. So I would invoke mpi as such: mpirun -np 16 mppLSDYNA input=myfile.k . That's not related to the issue but important to differentiate between the mpi program input and ls dyna application input.
Anyhow, I made up a simple collision simulation in LSDYNA to use as a test file (~15 kB) because our typical jobs have very large files (50-150MB) that have very long run times (often 7+ days). Therefore I chose a simple analysis that could be executed fast so I could see data from all parts of it and how OpenMPI behaved during the entire simulation...and that's where the problem was.
(I have read in various places that MPI_Allreduce is LS-DYNA's heavy hitter in the MPI communications and that is why I hypothesize the following:) The MPP communications of LS-DYNA do an MPI_Allreduce to coordinate for EVERY or very nearly every iteration of the program. My executable file ran so fast that it was completing 5000 iterations within a single second on a single core (I found this out very recently, minutes ago in-fact, when I was testing mpirun with only two cores locally). And that was where my network tie ups were happening.
I started measuring throughputs of the 16 core, 8 core, 4 core, and 2 core jobs over the network and was shocked to see that 16 cores was capping my network out at 120 Mbits/sec. 8 cores was also using 120 Mbits/sec, 4 cores used 75 Mbits/sec and 2 cores used around 30 or 40 Mbits/sec.
Needless to say, it finally clicked in my brain a few minutes ago, and I started up a 16 core job of our standard issue file once I realized that the communications were just happening too often, and not that they were taking a long time. I had the right idea initially, because typically the issue of the subroutines taking a long time is worrisome, but, with very repetitive and iterative programs comes the need for them to coordinate on a continuous and rapid basis. The 16 core job file I started up typically takes 100-120 hours, and typically runs on 8 cores for that amount of time (SMP). When I started this OpenMPI job, LS-DYNA gave me an estimate of 43 hours! This earns OpenMPI some great respect, quite a powerful program once setup correctly.
As a side note, the throughput of this job was around 17 Mbits/sec.
All in all, easily fixed, just a few days of frustration. Thank you all again for all of your help. It was paramount in enabling me to discover the issue. Thanks again.
--- On Wed, 7/14/10, Eugene Loh <eugene.loh_at_[hidden]> wrote:
From: Eugene Loh <eugene.loh_at_[hidden]>
Subject: [OMPI users] LS-DYNA profiling [was: OpenMPI Hangs, No Error]
To: "Open MPI Users" <users_at_[hidden]>
Date: Wednesday, July 14, 2010, 2:26 PM
-----Inline Attachment Follows-----
users mailing list