Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: [OMPI devel] 1.7.4rc2r30031 - FreeBSD-9 mpirun hangs
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-12-20 17:59:58


This case is not quite like my OpenBSD-5 report.
On FreeBSD-9 I *can* run singletons, but "-np 2" hangs.

The following hangs:
$ mpirun -np 2 examples/ring_c

The following complains about the "bogus" btl selection.
So this is not the same as my problem with OpenBSD-5:
$ mpirun -mca btl bogus -np 2 examples/ring_c
[freebsd9-amd64.qemu:05926] mca: base: components_open: component pml / bfo
open function failed
[freebsd9-amd64.qemu:05926] mca: base: components_open: component pml / ob1
open function failed
[freebsd9-amd64.qemu:05926] PML ob1 cannot be selected
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: freebsd9-amd64.qemu
Framework: btl
Component: bogus
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No available pml components were found!

This means that there are no components of this type installed on your
system or all the components reported that they could not be used.

This is a fatal error; your MPI process is likely to abort. Check the
output of the "ompi_info" command and ensure that components of this
type are available on your system. You may also wish to check the
value of the "component_path" MCA parameter and ensure that it has at
least one directory that contains valid MCA components.
--------------------------------------------------------------------------

For the non-bogus case, "top" show one idle and one active ring_c process:
  PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
 5933 phargrov 2 29 0 98M 6384K select 1 0:32 100.00% ring_c
 5931 phargrov 2 20 0 77844K 4856K select 0 0:00 0.00% orterun
 5932 phargrov 2 24 0 51652K 4960K select 0 0:00 0.00% ring_c

A backtrace for the 100%-cpu ring_c process:
(gdb) where
#0 0x0000000800d9811c in poll () from /lib/libc.so.7
#1 0x0000000800ae37fe in poll () from /lib/libthr.so.3
#2 0x00000008013259aa in poll_dispatch ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
#3 0x000000080131eb50 in opal_libevent2021_event_base_loop ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
#4 0x000000080106395d in orte_progress_thread_engine ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7
#5 0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3
#6 0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffffbfe000: Bad address.

And for the idle ring_c process:
(gdb) where
#0 0x0000000800d9811c in poll () from /lib/libc.so.7
#1 0x0000000800ae37fe in poll () from /lib/libthr.so.3
#2 0x00000008013259aa in poll_dispatch ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
#3 0x000000080131eb50 in opal_libevent2021_event_base_loop ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-pal.so.7
#4 0x000000080106395d in orte_progress_thread_engine ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-freebsd9-amd64/INST/lib/libopen-rte.so.7
#5 0x0000000800ae10a4 in pthread_getprio () from /lib/libthr.so.3
#6 0x0000000000000000 in ?? ()
Error accessing memory address 0x7fffffbfe000: Bad address.

They look to be the same, but I double checked that these are correct.

-Paul

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900