Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-08-29 12:43:00


This list is a fine place to report bugs.

The problem that you are running into is that you upgraded Open MPI without
first removing the prior version (it looks like the prior version of OMPI
was in the 1.0.x series -- 1.0.2 by your mail). Since Open MPI is based on
plugins, you need to ensure to remove all prior plugins before installing
the new version. In particular, the mca_pml_teg plugin was removed in the
1.1 series, so if you install any 1.1.x version over an old 1.0.x install,
the mca_pml_teg plugin will still be left there and cause the exact problem
that you're seeing. Sorry about that! :-(

That being said, Los Alamos discovered a problem with the bproc support (see
https://svn.open-mpi.org/trac/ompi/ticket/318) which will probably force a
v1.1.2 in the Very Near Future.

On 8/29/06 10:53 AM, "Daniel Gruner" <dgruner_at_[hidden]> wrote:

> Hi
>
> I am wondering who to report bugs to... In short, the version 1.1.1
> that I downloaded yesterday simply does NOT work. I have tried it on
> two different clusters, both running bproc. One is an i386 cluster
> with GNU compilers, and the other is an x86_64 (Opteron) cluster with
> PathScale compilers.
>
> Here is an example of the problems, from the Opteron cluster. The example
> is in C++, but the failures are the same for fortran programs.
>
> -------------- hello++.cc --------------------------
> #include <iostream>
> // modified to reference the master mpi.h file, to meet the MPI standard spec.
> #include "mpi.h"
>
> int
> main(int argc, char *argv[])
> {
> MPI_Init(&argc,&argv);
> //MPI::Init(argc, argv);
>
> // int size = MPI::COMM_WORLD.Get_size();
> // int rank = MPI::COMM_WORLD.Get_rank();
> int size, rank;
> int namelen;
> char processor_name[MPI_MAX_PROCESSOR_NAME];
>
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>
> MPI_Get_processor_name(processor_name,&namelen);
>
> std::cerr << "Process " << rank << " on " << processor_name << std::endl;
>
>
>
> std::cout << "Hello World! I am " << rank << " of " << size << std::endl;
>
> //MPI::Finalize();
> MPI_Finalize();
> }
> -----------------------------------------------------
>
> Here are 3 attempts at running the program:
>
> sonoma:dgruner{128}> ./hello
> [sonoma.chem.utoronto.ca:31072] mca: base: component_find: unable to open:
> /usr/local/lib/openmpi/mca_pml_teg.so: undefined symbol:
> mca_ptl_base_modules_initialized (ignored)
> Process 0 on sonoma.chem.utoronto.ca
> Hello World! I am 0 of 1
> Signal:11 info.si_errno:0(Success) si_code:128()
> Failing at addr:(nil)
> [0] func:/usr/local/lib/libopal.so.0 [0x2a959f7463]
> *** End of error message ***
> Segmentation fault
>
>
> sonoma:dgruner{129}> mpirun -n 1 ./hello
> [n17:31074] pls_bproc_orted: openpty failed, using pipes instead
> [n17:31075] mca: base: component_find: unable to open:
> /usr/local/lib/openmpi/mca_pml_teg.so: undefined symbol:
> mca_ptl_base_modules_initialized (ignored)
> Process 0 on n17
> Hello World! I am 0 of 1
> Signal:11 info.si_errno:0(Success) si_code:128()
> Failing at addr:(nil)
> [0] func:/usr/local/lib/libopal.so.0 [0x2a95702463]
> *** End of error message ***
> Segmentation fault
>
>
> sonoma:dgruner{130}> mpirun -n 2 ./hello
> [n17:31078] pls_bproc_orted: openpty failed, using pipes instead
> [n17:31080] mca: base: component_find: unable to open:
> /usr/local/lib/openmpi/mca_pml_teg.so: undefined symbol:
> mca_ptl_base_modules_initialized (ignored)
> [n21:31081] mca: base: component_find: unable to open:
> /usr/local/lib/openmpi/mca_pml_teg.so: undefined symbol:
> mca_ptl_base_modules_initialized (ignored)
> mpirun: killing job...
>
> IT HANGS, SO I KILLED IT
>
> mpirun noticed that job rank 0 with PID 31080 on node "17" exited on signal 2.
> Signal:11 info.si_errno:0(Success) si_code:128()
> Failing at addr:(nil)
> [0] func:/usr/local/lib/libopal.so.0 [0x2a95702463]
> *** End of error message ***
> Segmentation fault
>
>
> Similar problems happen on the i386 cluster, and in both clusters with fortran
> programs as well.
>
> For the record, version 1.0.2 was running OK on both clusters.
>
> Daniel
>
>
> On Mon, Aug 28, 2006 at 03:37:55PM -0400, Jeff Squyres wrote:
>> The Open MPI Team, representing a consortium of research, academic, and
>> industry partners, is pleased to announce the release of Open MPI version
>> 1.1.1. This release is mainly a bug fix release over the the v1.1 release,
>> but there are few minor new features. Version 1.1.1 can be downloaded from
>> the main Open MPI web site or any of its mirrors (mirrors will be updating
>> shortly).
>>
>> We strongly recommend that all users upgrade to version 1.1.1 if possible.
>>
>> Here are a list of changes in v1.1.1 as compared to v1.1:
>>
>> - Fix for Fortran string handling in various MPI API functions.
>> - Fix for Fortran status handling in MPI_WAITSOME and MPI_TESTSOME.
>> - Various fixes for the XL compilers.
>> - Automatically disable using mallot() on AIX.
>> - Memory fixes for 64 bit platforms with registering MCA parameters in
>> the self and MX BTL components.
>> - Fixes for BProc to support oversubscription and changes to the
>> mapping algorithm so that mapping processes "by slot" works as
>> expected.
>> - Fixes for various abort cases to not hang and clean up nicely.
>> - If using the Intel 9.0 v20051201 compiler on an IA64 platform, the
>> ptmalloc2 memory manager component will automatically disable
>> itself. Other versions of the Intel compiler on this platform seem
>> to work fine (e.g., 9.1).
>> - Added "host" MPI_Info key to MPI_COMM_SPAWN and
>> MPI_COMM_SPAWN_MULTIPLE.
>> - Add missing C++ methods: MPI::Datatype::Create_indexed_block,
>> MPI::Datatype::Create_resized, MPI::Datatype::Get_true_extent.
>> - Fix OSX linker issue with Fortran bindings.
>> - Fixed MPI_COMM_SPAWN to start spawning new processes in slots that
>> (according to Open MPI) are not already in use.
>> - Added capability to "mpirun a.out" (without specifying -np) that
>> will run on all currently-allocated resources (e.g., within a batch
>> job such as SLURM, Torque, etc.).
>> - Fix a bug with one particular case of MPI_BCAST. Thanks to Doug
>> Gregor for identifying the problem.
>> - Ensure that the shared memory mapped file is only created when there
>> is more than one process on a node.
>> - Fixed problems with BProc stdin forwarding.
>> - Fixed problem with MPI_TYPE_INDEXED datatypes. Thanks to Yven
>> Fournier for identifying this problem.
>> - Fix some thread safety issues in MPI attributes and the openib BTL.
>> - Fix the BProc allocator to not potentially use the same resources
>> across multiple ORTE universes.
>> - Fix gm resource leak.
>> - More latency reduction throughout the code base.
>> - Make the TM PLS (PBS Pro, Torque, Open PBS) more scalable, and fix
>> some latent bugs that crept in v1.1. Thanks to the Thunderbird crew
>> at Sandia National Laboratories and Martin Schaffoner for access to
>> testing facilities to make this happen.
>> - Added new command line options to mpirun:
>> --nolocal: Do not run any MPI processes on the same node as mpirun
>> (compatibility with the OSC mpiexec launcher)
>> --nooversubscribe: Abort if the number of processes requested would
>> cause oversubscription
>> --quiet / -q: do not show spurious status messages
>> --version / -V: show the version of Open MPI
>> - Fix bus error in XGrid process starter. Thanks to Frank from the
>> Open MPI user's list for identifying the problem.
>> - Fix data size mismatches that caused memory errors on PPC64
>> platforms during the startup of the openib BTL.
>> - Allow propagation of SIGUSR1 and SIGUSR2 signals from mpirun to
>> back-end MPI processes.
>> - Add missing MPI::Is_finalized() function.
>>
>> --
>> Jeff Squyres
>> Server Virtualization Business Unit
>> Cisco Systems
>> _______________________________________________
>> announce mailing list
>> announce_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/announce

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems