Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Lydia Heck (lydia.heck_at_[hidden])
Date: 2006-11-23 05:42:49

Gadget2 - I cannot attach it because it is not publicly available,
runs perfectly fine on any number of processes on systems such
as Solaris 10 - Sun CT6 gigabit, SUN CT5 and myrinet gm, IBM regatta ..

Sorry to be so expansive ...

When I run the code on 32 CPUs on openmpi, mx using the studio11 compilers
on a solaris x64 system the code works fine, until about the end, when
it fails to write all the restart files.

When I run the code on 64 CPUs it fails with an error message which is

Topnodes=218193 costlimit=0.0890015 countlimit=428.229
NTopleaves= 40496 NTopnodes=46281 (space for 347252)
desired memory imbalance=2.83425 (limit=100719, needed=114185)
Note: the domain decomposition is suboptimum because the ceiling for
memory-imbalance is reached
work-load balance=1.28529 memory-balance=1.01948
exchange of 0002589387 particles
Signal:11 info.si_errno:0(Error 0) si_code:1(SEGV_MAPERR)
Failing at addr:5192cbd0
/opt/mx/lib/amd64/ [ Signal 11 (SEGV)]
*** End of error message ***
63 additional processes aborted (not shown)
m2001(26) > /opt/ompi/bin/mpirun -np 32 -machinefile ./myh-all -mca pml cm
./Gadget2 param.txt

As this is one of our predominant production codes, I need to make sure
that it is running on any system which I install. Any idea would be welcome.


Dr E L Heck

University of Durham
Institute for Computational Cosmology
Ogden Centre
Department of Physics
South Road

United Kingdom

e-mail: lydia.heck_at_[hidden]

Tel.: + 44 191 - 334 3628
Fax.: + 44 191 - 334 3645