On 6/9/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Jun 8, 2007, at 9:29 AM, Code Master wrote:
> > I compiled openmpi-1.2.2 with:
> > ./configure CFLAGS=-g -pg -O3 --prefix=/home/foo/490_research/490/
> > src/mpi.optimized_profiling/ \
> > --enable-mpi-threads --enable-progress-threads --enable-static --
> > disable-shared --without-memory-manager \
> > --without-libnuma --disable-mpi-f77 --disable-mpi-f90 --disable-mpi-
> > cxx --disable-mpi-cxx-seek --disable-dlopen
> > (Thanks Jeff, now I know that I have to add --without-memory-
> > manager and --without-libnuma for static linking)
> > make all
> > make install
> > then I run my client app with:
> > /home/foo/490_research/490/src/mpi.optimized_profiling/bin/mpirun --
> > hostfile ../hostfile -n 32 raytrace -finputs/car.env
> > The program runs well and each process completes succssfully (I can
> > tell because all processes have now generated gmon.out successfully
> > and a "ps aux" on other slave nodes (except the originating node)
> > show that my program in slave nodes have already exited (not
> > existant). Therefore I think this may have something to do with
> > mpirun,which hangs forever.
> Be aware that you may have problems with multiple processes writing
> to the same gmon.out, unless you're running each instance in a
> different directory (your command line doesn't indicate that you are,
> but that doesn't necessarily prove anything).
I am sure this is not happening, because in my program, after the MPI
initialization, the main() invokes chdir() which immediately change
the directory to the process's own directory (named after the
proc_id). Therefore they all have their own directory to write to.
> > Can you see anything wrong in my ./configure command which explains
> > the mpirun hang at the end of the run? How can I fix it?
> No, everything looks fine.
> So you confirm that all raytrace instances have exited and all orteds
> have exited, leaving *only* mpirun runnning?
Yes, I am sure that all raytrace instances as well as all mpi-related
processes (including mpirun and orteds etc.) have exited in all slave
nodes. In the *master* node, all raytrace instances and all orteds
have exited as well, leaving *only* mpirun running in the *master*
14818 pts/0 S+ 0:00
--hostfile ../hostfile -n 32 raytrace -finputs/car.env -s
> There was a race condition about this at one point; Ralph -- can you
> comment further?
> Jeff Squyres
> Cisco Systems
> users mailing list