Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Daniel Gruner (dgruner_at_[hidden])
Date: 2007-04-26 16:06:46


Hi

I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
based clusters, and I have found some problems/issues. All my
clusters have standard ethernet interconnects, either 100Base/T or
Gigabit, on standard switches.

The clusters are all running Clustermatic 5 (BProc 4.x), and range
from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron. In all cases
the same problems occur, identically. I attach here the results
from "ompi_info --all" and the config.log, for my latest build on
an Opteron cluster, using the Pathscale compilers. I had exactly
the same problems when using the vanilla GNU compilers.

Now for a description of the problem:

When running an mpi code (cpi.c, from the standard mpi examples, also
attached), using the mpirun defaults (e.g. -byslot), with a single
process:

        sonoma:dgruner{134}> mpirun -n 1 ./cpip
        [n17:30019] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        pi is approximately 3.1415926544231341, Error is 0.0000000008333410
        wall clock time = 0.000199

However, if one tries to run more than one process, this bombs:

        sonoma:dgruner{134}> mpirun -n 2 ./cpip
        .
        .
        .
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        [n21:30029] OOB: Connection to HNP lost
        .
        . ad infinitum

If one uses de option "-bynode", things work:

        sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
        [n17:30055] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 1 on n21
        pi is approximately 3.1415926544231318, Error is 0.0000000008333387
        wall clock time = 0.010375

Note that there is always the message about "openpty failed, using pipes instead".

If I run more processes (on my 3-node cluster, with 2 cpus per node), the
openpty message appears repeatedly for the first node:

        sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        [n17:30061] odls_bproc: openpty failed, using pipes instead
        Process 0 on n17
        Process 2 on n49
        Process 1 on n21
        Process 5 on n49
        Process 3 on n17
        Process 4 on n21
        pi is approximately 3.1415926544231239, Error is 0.0000000008333307
        wall clock time = 0.050332

Should I worry about the openpty failure? I suspect that communications
may be slower this way. Using the -byslot option always fails, so this
is a bug. The same occurs for all the codes that I have tried, both simple
and complex.

Thanks for your attention to this.
Regards,
Daniel

-- 
Dr. Daniel Gruner                        dgruner_at_[hidden]
Dept. of Chemistry                       daniel.gruner_at_[hidden]
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key




  • application/x-gzip attachment: cpi.c.gz