Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI users] errors trying to run a simple mpi task
From: dani (dani_at_[hidden])
Date: 2013-06-23 05:42:10


I've encountered strange issues when trying to run a simple mpi job on a single host which has IB.
The complete errors:

-> mpirun -n 1 hello
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
[[53031,1],0]: A high-performance Open MPI point-to-point messaging module
was unable to find any relevant network interfaces:

Module: uDAPL
  Host: n01

Another transport will be used instead, although this may result in
lower performance.
WARNING: It appears that your OpenFabrics subsystem is configured to only
allow registering part of your physical memory.  This can cause MPI jobs to
run with erratic performance, hang, and/or crash.

This may be caused by your OpenFabrics vendor limiting the amount of
physical memory that can be registered.  You should investigate the
relevant Linux kernel module parameters that control how much physical
memory can be registered, and increase them to allow registering all
physical memory on your machine.

See this Open MPI FAQ item for more information on these Linux kernel module

  Local host:              n01
  Registerable memory:     32768 MiB
  Total memory:            65503 MiB

Your MPI job will continue, but may be behave poorly and/or hang.
Process 0 on n01 out of 1
[n01:13534] 7 more processes have sent help message help-mpi-btl-udapl.txt / dat_ia_open fail
[n01:13534] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Following my setup and other info:
OS: CentOS 6.3 x86_64
installed ofed 3.5 from source ( ./ --all)
installed openmpi 1.6.4 with the following build parameters:
rpmbuild --rebuild openmpi-1.6.4-1.src.rpm --define '_prefix /opt/openmpi/1.6.4/gcc' --define '_defaultdocdir /opt/openmpi/1.6.4/gcc' --define '_mandir %{_prefix}/share/man' --define '_datadir %{_prefix}/share' --define 'configure_options --with-openib=/usr --with-openib-libdir=/usr/lib64 CC=gcc CXX=g++ F77=gfortran FC=gfortran --enable-mpirun-prefix-by-default --target=x86_64-unknown-linux-gnu --with-hwloc=/usr/local --with-libltdl --enable-branch-probabilities --with-udapl --with-sge --disable-vt' --define 'use_default_rpm_opt_flags 1' --define '_name openmpi-1.6.4_gcc' --define 'install_shell_scripts 1' --define 'shell_scripts_basename mpivars' --define '_usr /usr' --define 'ofed 0' 2>&1 | tee
(disable -vt was used due to cuda presence which is automatically linked by vt, and becomes a dependency with no matching rpm).

memorylocked is unlimited:
->ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 515028
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
IB devices are present:
hca_id:    mlx4_0
    transport:            InfiniBand (0)
    fw_ver:                2.9.1000
    node_guid:            0002:c903:004d:b0e2
    sys_image_guid:            0002:c903:004d:b0e5
    vendor_id:            0x02c9
    vendor_part_id:            26428
    hw_ver:                0xB0
    board_id:            MT_0D90110009
    phys_port_cnt:            1
        port:    1
            state:            PORT_ACTIVE (4)
            max_mtu:        4096 (5)
            active_mtu:        4096 (5)
            sm_lid:            2
            port_lid:        53
            port_lmc:        0x00
            link_layer:        InfiniBand

the hello program source:
->cat hello.c
#include <stdio.h>
#include <mpi.h>

int main(int argc, char *argv[]) {
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Get_processor_name(processor_name, &namelen);

  printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);

simply compiled as:
mpicc hello.c -o hello

the IB modules seem to be present:
->service openibd status

  HCA driver loaded

Configured IPoIB devices:

Currently active IPoIB devices:

The following OFED modules are loaded:


Can anyone help?