Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Galen M. Shipman (gshipman_at_[hidden])
Date: 2006-02-02 17:19:58


Hi Jean,

I just noticed that you are running Quad proc nodes and are using:

> bench1 slots=4 max-slots=4

in your machine file and you are running the benchmark using only 2
processes via:

> mpirun -prefix /opt/ompi -wdir `pwd` -machinefile /root/machines -
> np 2 PMB-MPI1

By using slots=4 you are telling Open MPI to put the first 4
processes on the "bench1" host.
Open MPI will therefore use shared memory to communicate between the
processes not Infiniband.
I still am not able to replicate your problem using machines
available to me and the 1.0.1 release.

Is it possible for you to get a stack trace where this is hanging?

You might try:

mpirun -prefix /opt/ompi -wdir `pwd` -machinefile /root/machines -np
2 -d xterm -e gdb PMB-MPI1

This will run the benchmark using gdb. Alternatively I could debug
this on your machine if an account were available.

Thanks,

Galen

On Jan 18, 2006, at 2:13 PM, Jean-Christophe Hugly wrote:

>
> Hi,
>
> I have been trying for the past few days to get an MPI application
> (the
> pallas bm) to run with ompi and openib.
>
> My environment:
> ===============
> . two quad cpu hosts with one mlx hca each.
> . the hosts are running suse10 (kernel 2.6.13) with the latest
> (close to
> it) from openib. (rev 4904, specifically)
> . opensm runs on third machine with the same os.
> . openmpi is built from openmpi-1.1a1r8727.tar.bz2
>
> Behaviour:
> ==========
> . openib seems to behave ok (ipoib works, rdma_bw and rdma_lat
> work, osm
> works)
> . I can mpirun any non-mpi program like ls, hostname, or ompi_info all
> right.
> . I can mpirun the pallas bm on any single host (the local one or the
> other)
> . I can mpirun the pallas bm on the two nodes provided that I disable
> the openib btl
> . If I try to use the openib btl, the bm does not start (at best I get
> the initial banner, sometimes not). On both hosts, I see that the PMB
> processes (the correct number for each host) use 99% cpu.
>
> I obtained the exact same behaviour with the following src packages:
> openmpi-1.0.1.tar.bz2
> openmpi-1.0.2a3r8706.tar.bz2
> openmpi-1.1a1r8727.tar.bz2
>
> Earlier on, I also did the same experiment with openmpi-1.0.1 and the
> stock gen2 of the suse kernel; same thing.
>
> Configuration:
> ==============
> For building, I tried the following variants:
>
> ./configure --prefix=/opt/ompi --enable-mpi-threads --enable-
> progress-thread
> ./configure --prefix=/opt/ompi
> ./configure --prefix=/opt/ompi --disable-smp-locks
>
> I also tried many variations to mca-params.conf. What I normally
> use for trying openib is:
> rmaps_base_schedule_policy = node
> btl = ^tcp
> mpi_paffinity_alone = 1
>
> The mpirun cmd I normally use is:
> mpirun -prefix /opt/ompi -wdir `pwd` -machinefile /root/machines -
> np 2 PMB-MPI1
>
> My machine file being:
> bench1 slots=4 max-slots=4
> bench2 slots=4 max-slots=4
>
> Am I doing something obviously wrong ?
>
> Thanks for any help !
>
> --
> Jean-Christophe Hugly <jice_at_[hidden]>
> PANTA
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users