Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] [EXTERNAL] 1.7.4rc2r30031 - OpenBSD-5 mpirun hangs
From: Paul Hargrove (phhargrove_at_[hidden])
Date: 2013-12-20 17:48:03


Brian,

Of course, I should have thought of that myself.
See below for backtrace from a singleton run.

I'm starting an --enable-debug build to maybe get some line number info too.

-Paul

(gdb) where
#0 0x00000406457a9e3a in nanosleep () at <stdin>:2
#1 0x000004063947e2d4 in nanosleep (rqtp=0x7f7ffffeca30, rmtp=0x0)
    at /usr/src/lib/librthread/rthread_cancel.c:274
#2 0x0000040644a5a89b in orte_routed_base_register_sync ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/libopen-rte.so.7.0
#3 0x00000406490d943c in init_routes ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/openmpi/mca_routed_binomial.so
#4 0x0000040644a3c37f in orte_ess_base_app_setup ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/libopen-rte.so.7.0
#5 0x000004063eb1797d in rte_init ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/openmpi/mca_ess_env.so
#6 0x0000040644a1a3fe in orte_init ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/libopen-rte.so.7.0
#7 0x00000406482c7976 in ompi_mpi_init ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/libmpi.so.4.0
#8 0x00000406482eac92 in PMPI_Init ()
   from
/home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/lib/libmpi.so.4.0
#9 0x0000040438c01093 in main (argc=1, argv=0x7f7ffffece60) at ring_c.c:19
Current language: auto; currently asm

On Fri, Dec 20, 2013 at 2:38 PM, Barrett, Brian W <bwbarre_at_[hidden]>wrote:

> Paul -
>
> Any chance you could grab a stack trace from the mpi app? That's probably
> the fastest next step
>
> Brian
>
>
>
> Sent with Good (www.good.com)
>
>
> -----Original Message-----
> *From: *Paul Hargrove [phhargrove_at_[hidden]]
> *Sent: *Friday, December 20, 2013 03:33 PM Mountain Standard Time
> *To: *Open MPI Developers
> *Subject: *[EXTERNAL] [OMPI devel] 1.7.4rc2r30031 - OpenBSD-5 mpirun hangs
>
> With plenty of help from Jeff and Ralph's bug fixes in the past 24 hours,
> I can now build OMPI for NetBSD. However, running even a simple example
> fails:
>
> Having set PATH and LD_LIBARY_PATH:
> $ mpirun -np 1 examples/ring_c
> just hangs
>
> Output from "top" shows idle procs:
> PID USERNAME PRI NICE SIZE RES STATE WAIT TIME CPU
> COMMAND
> 31841 phargrov 10 0 2140K 3960K sleep/1 nanosle 0:00 0.00% ring_c
> 13490 phargrov 2 0 2540K 4892K sleep/1 poll 0:00 0.00% orterun
>
> Distrusting then env vars and relying instead on the auto-prefix
> behavior:
> $ /home/phargrov/OMPI/openmpi-1.7-latest-openbsd5-amd64/INST/bin/mpirun
> -np 1 examples/ring_c
> also hangs
>
> Not sure exactly what to infer from this, but a "bogus" btl doesn't
> produce any complaint, which may indicate how far startup got:
> $ mpirun -mca btl bogus -np 1 examples/ring_c
> Still hangs, and no complaint about the blt selection
>
> All three cases above are singleton (-np 1) runs, but the behavior with
> "-np 2" is the same.
>
> This does NOT appear to be an ORTE problem:
> -bash-4.2$ orterun -np 1 date
> Fri Dec 20 14:11:42 PST 2013
> -bash-4.2$ orterun -np 2 date
> Fri Dec 20 14:11:45 PST 2013
> Fri Dec 20 14:11:45 PST 2013
>
> Let me know what sort of verbose mca parameters to set and I'll collect
> the info.
> Compressed output of "ompi_info --all" is attached.
>
> -Paul
>
> --
> Paul H. Hargrove PHHargrove_at_[hidden]
> Future Technologies Group
> Computer and Data Sciences Department Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Paul H. Hargrove                          PHHargrove_at_[hidden]
Future Technologies Group
Computer and Data Sciences Department     Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900