Thank you so much. Well, the memory is enough. As I said, the jobs run
and the whole process is actually done without complaining about memory,
but they are not ended up correctly. I first tries to solve this using
1. all processes except root will wait before MPI_Finalize routine is
called for a message from root
2. when root arrives this point, starts sending message to all processes
to make them out of blocking mode
This is actually a barrier. The solution didn't work initially, but when
I added some "cout" lines to write if operation is done successfully, it
worked perfectly. I think writing to output makes some delay that is
useful here. However, I did need to write these messages, so the problem
solved in a correct way. ;) Now it works anyway and I think it will work
in the future too since the problem that I tested with is gigantic!
Thanks for your help again,
Gus Correa wrote:
> Hi Danesh
> Make sure you have 700GB of RAM on the sum of all nodes you are using.
> Otherwise context switching and memory swapping may be the problem.
> MPI doesn't perform well in this conditions (and may break, particularly
> on large problems, I suppose).
> A good way to go about it is to look at the physical
> "RAM per core" if those are multi-core machines,
> and compare to the actual memory per core your program requires.
> You need to give the system some RAM also, and use no more than 80% or
> so of the memory.
> If you or a system administrator has access to the nodes,
> you can monitor the memory use with "top".
> If you have Ganglia on this cluster, you can use the memory report
> metric also.
> Another possibility is a memory leak, which may be in your program,
> or (less likely) in MPI.
> Note, however, that OpenMPI 1.3.0 and 1.3.1 had this problem (with
> Infinband only), which was fixed in 1.3.2:
> If you are using 1.3.0 or 1.3.1, upgrade to 1.3.2.
> I hope this helps.
> Gus Correa
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> Danesh Daroui wrote:
>> Dear all,
>> I am not sure if this the right forum to ask this question, so sorry if
>> I am wrong. I am using ScaLAPACK in my code and MPI of course (OMPI) in
>> a electromagnetic solver program, running on a cluster. I get very
>> strange behavior when I use a large number of processors to run my code
>> for very large problems. In these cases, however, the program finishes
>> successfully, but it stays until the wall time exceeds the limit and the
>> job is terminated by queue manager (I use qsub ti submit a job). This
>> happens when, for example I use more than 80 processors for a problem
>> which needs more than 700 GB memory. For smaller problem, everything is
>> OK and all output files are generated correctly, while when this
>> happens, the output files are empty. I am almost sure that there is a
>> synchronization problem and some processes fail to reach the
>> finalization point while others are done.
>> My code is written in C++ and in "main" function I call a routine called
>> "Solver". My Solver function looks like below:
>> for (std::vector<double>::iterator ti=times.begin();
>> ti!=times.end(); ++ti)
>> Stopwatch iwatch, dwatch, twatch;
>> // some ScaLAPACK operations
>> if (iamroot())
>> // some operation only for root process
>> and my "main" function which calls "Solver" looks like below:
>> int main()
>> // some preparing operations
>> if (rank==0)
>> std::cout << "Total execution time: " << time.tick() <<
>> " s\n" << std::flush;
>> if (MPI_SUCCESS!=err)
>> std::cerr << "MPI_Finalize failed: " << err << "\n";
>> return err;
>> return 0;
>> I did put a "blacs::barrier(ictxt, 'A')" at the and of "Solver" routine,
>> before calling "blacs::exit(1)" to make sure that all processes arrive
>> here before MPI_Finalize, but the problem didn't solve. Do you have any
>> idea where the problem is?
>> Thanks in advance,
> users mailing list
Lulea University of Technology