Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] pure static "mpirun" launcher (Jeff Squyres) - now testing
From: Ilias Miroslav (Miroslav.Ilias_at_[hidden])
Date: 2012-01-30 14:40:16


Hi,

what segfaulted ? I am not sure...maybe application is bug showing up with static OpenMPI.

I try to compile & run simplest MPI example and I shall let you know.

In betweem I am attaching debugger output would help to track this bug:

Backtrace for this error:
  + function __restore_rt (0x255B110)
    from file sigaction.c

slave (mpi processes are from #10):
(gdb) where
#0 0x00000000023622db in sm_fifo_read (fifo=0x7f77cc908300) at btl_sm.h:324
#1 0x000000000236309b in mca_btl_sm_component_progress () at btl_sm_component.c:612
#2 0x0000000002304f26 in opal_progress () at runtime/opal_progress.c:207
#3 0x00000000023c8a77 in opal_condition_wait (c=0xf78bf80, m=0xf78c000) at ../../../../opal/threads/condition.h:100
#4 0x00000000023c8eb7 in ompi_request_wait_completion (req=0x10602f00) at ../../../../ompi/request/request.h:378
#5 0x00000000023ca661 in mca_pml_ob1_send (buf=0xefbb2a0, count=1000, datatype=0x2901180, dst=1, tag=-17,
    sendmode=MCA_PML_BASE_SEND_STANDARD, comm=0xf772b20) at pml_ob1_isend.c:125
#6 0x000000000236e978 in ompi_coll_tuned_bcast_intra_split_bintree (buffer=0xefb9360, count=2000, datatype=0x2901180, root=0,
    comm=0xf772b20, module=0x1060b7f0, segsize=1024) at coll_tuned_bcast.c:590
#7 0x0000000002370834 in ompi_coll_tuned_bcast_intra_dec_fixed (buff=0xefb9360, count=2000, datatype=0x2901180, root=0,
    comm=0xf772b20, module=0x1060b7f0) at coll_tuned_decision_fixed.c:262
#8 0x0000000002371c52 in mca_coll_sync_bcast (buff=0xefb9360, count=2000, datatype=0x2901180, root=0, comm=0xf772b20,
    module=0x1060b590) at coll_sync_bcast.c:44
#9 0x0000000002249662 in PMPI_Bcast (buffer=0xefb9360, count=2000, datatype=0x2901180, root=0, comm=0xf772b20) at pbcast.c:110
#10 0x000000000221744a in mpi_bcast_f (
    buffer=0xefb9360 "\026\372`\031\033\336O@\005\031\001\025\216\260&@\301\343Û»\006\375\003@\251L1\aAG\344?\301\343Û»\006\375\003@\251L1\aAG\344?HN&n\025\233\\@\252\325WW\005^4_at_8\333ܘ\236`\027@\025\253\006an\267\367?Ih˹\024W\324?8\021\375\332\372\351\273?8\333ܘ\236`\027@\025\253\006an\267\367?Ih˹\024W\324?8\021\375\332\372\351\273?\301\343Û»\006\375\003@\251L1\aAG\344?\026\372`\031\033\336O@\005\031\001\025\216\260&@\301\343Û»\006\375\003@\251L1\aAG\344?\301\343Û»\006\375\003@\251L1\aAG\344?8\333ܘ\236`\027@"...,
    count=0x2624818, datatype=0x26037a0, root=0xf73ba90, comm=0x26247a0, ierr=0x7fffbb34ec78) at pbcast_f.c:70
#11 0x000000000041ab68 in interface_to_mpi::interface_mpi_bcast_r1 (x=<value optimized out>, ndim=2000, root_proc=0, communicator=0)
    at /home/ilias/qch_work/qch_software/dirac_git/dirac-git-repo/interface_mpi/interface_to_mpi.F90:446
#12 0x0000000000e95a37 in get_primitf () at /home/ilias/qch_work/qch_software/dirac_git/dirac-git-repo/abacus/herpar.F:1464
#13 0x0000000000e99ff7 in sdinit (dmat=..., ndmat=2, irepdm=..., ifctyp=..., itype=9, maxdif=<value optimized out>, iatom=0,
    nodv=.TRUE., nopv=.TRUE., nocont=.FALSE., tktime=.FALSE., retur=.FALSE., i2typ=1, icedif=3, screen=9.9999999999999998e-13,
    gabrao=..., dmrao=..., dmrso=...) at /home/ilias/qch_work/qch_software/dirac_git/dirac-git-repo/abacus/herpar.F:566
#14 0x0000000000e9af35 in her_pardrv (work=..., lwork=<value optimized out>, fmat=..., dmat=..., ndmat=2, irepdm=..., ifctyp=...,
.
.
.
and master:
(gdb) where
#0 0x000000000058de18 in poll ()
#1 0x0000000000496f58 in poll_dispatch ()
#2 0x0000000000471649 in opal_libevent2013_event_base_loop ()
#3 0x00000000004016ea in orterun (argc=4, argv=0x7fff484b6478) at orterun.c:866
#4 0x00000000004005d4 in main (argc=4, argv=0x7fff484b6478) at main.c:13

________________________________________
From: Ilias Miroslav
Sent: Monday, January 30, 2012 7:24 PM
To: users_at_[hidden]
Subject: Re: pure static "mpirun" launcher (Jeff Squyres) - now testing

Hi Jeff,

thanks for the fix;

I downloaded the Open MPI trunk and have built it up,

the (most recent) revision 25818 is giving this error and hangs:

/home/ilias/bin/ompi_ilp64_static/bin/mpirun -np 2 ./dirac.x
.
.
Program received signal 11 (SIGSEGV): Segmentation fault.

Backtrace for this error:
  + function __restore_rt (0x255B110)
    from file sigaction.c

The configuration:
  $ ./configure --prefix=/home/ilias/bin/ompi_ilp64_static --without-memory-manager LDFLAGS=--static --disable-shared --enable-static CXX=g++ CC=gcc F77=gfortran FC=gfortran FFLAGS=-m64 -fdefault-integer-8 FCFLAGS=-m64 -fdefault-integer-8 CFLAGS=-m64 CXXFLAGS=-m64 --enable-ltdl-convenience --no-create --no-recursion

The "dirac.x" static executable was obtained with this static openmpi:
   write(lupri, '(a)') ' System | Linux-2.6.30-1-amd64'
    write(lupri, '(a)') ' Processor | x86_64'
    write(lupri, '(a)') ' Internal math | ON'
    write(lupri, '(a)') ' 64-bit integers | ON'
    write(lupri, '(a)') ' MPI | ON'
    write(lupri, '(a)') ' Fortran compiler | /home/ilias/bin/ompi_ilp64_static/bin/mpif90'
    write(lupri, '(a)') ' Fortran compiler version | GNU Fortran (Debian 4.6.2-9) 4.6.2'
    write(lupri, '(a)') ' Fortran flags | -g -fcray-pointer -fbacktrace -DVAR_GFORTRAN -DVAR'
    write(lupri, '(a)') ' | _MFDS -fno-range-check -static -fdefault-integer-8'
    write(lupri, '(a)') ' | -O3 -funroll-all-loops'
    write(lupri, '(a)') ' C compiler | /home/ilias/bin/ompi_ilp64_static/bin/mpicc'
    write(lupri, '(a)') ' C compiler version | gcc (Debian 4.6.2-9) 4.6.2'
    write(lupri, '(a)') ' C flags | -g -static -fpic -O2 -Wno-unused'
    write(lupri, '(a)') ' static libraries linking | ON'

ldd dirac.x
        not a dynamic executable

Any help, please ? How to include MPI-debug statements ?

   1. Re: pure static "mpirun" launcher (Jeff Squyres)
 ----------------------------------------------------------------------
Message: 1
Date: Fri, 27 Jan 2012 13:44:49 -0500
From: Jeff Squyres <jsquyres_at_[hidden]>
Subject: Re: [OMPI users] pure static "mpirun" launcher
To: Open MPI Users <users_at_[hidden]>
Message-ID: <BE6DBE92-784C-4594-8F4A-397A19C55EEA_at_[hidden]>
Content-Type: text/plain; charset=us-ascii

Ah ha, I think I got it. There was actually a bug about disabling the memory manager in trunk/v1.5.x/v1.4.x. I fixed it on the trunk and scheduled it for v1.6 (since we're trying very hard to get v1.5.5 out the door) and v1.4.5.

On the OMPI trunk on RHEL 5 with gcc 4.4.6, I can do this:

./configure --without-memory-manager LDFLAGS=--static --disable-shared --enable-static

And get a fully static set of OMPI executables. For example:

-----
[10:41] svbu-mpi:~ % cd $prefix/bin
[10:41] svbu-mpi:/home/jsquyres/bogus/bin % ldd *
mpic++:
        not a dynamic executable
mpicc:
        not a dynamic executable
mpiCC:
        not a dynamic executable
mpicxx:
        not a dynamic executable
mpiexec:
        not a dynamic executable
mpif77:
        not a dynamic executable
mpif90:
        not a dynamic executable
mpirun:
        not a dynamic executable
ompi-clean:
        not a dynamic executable
ompi_info:
        not a dynamic executable
ompi-ps:
        not a dynamic executable
ompi-server:
        not a dynamic executable
ompi-top:
        not a dynamic executable
opal_wrapper:
        not a dynamic executable
ortec++:
        not a dynamic executable
ortecc:
        not a dynamic executable
orteCC:
        not a dynamic executable
orte-clean:
        not a dynamic executable
orted:
        not a dynamic executable
orte-info:
        not a dynamic executable
orte-ps:
        not a dynamic executable
orterun:
        not a dynamic executable
orte-top:
        not a dynamic executable
-----

So I think the answer here is: it depends on a few factors:

1. Need that bug fix that I just committed.
2. Libtool is stripping out -static (and/or --static?). So you have to find some other flags to make your compiler/linker do static.
3. Your OS has to support static builds. For example, RHEL6 doesn't install libc.a by default (it's apparently on the optional DVD, which I don't have). My RHEL 5.5 install does have it, though.

On Jan 27, 2012, at 11:16 AM, Jeff Squyres wrote:

> I've tried a bunch of variations on this, but I'm actually getting stymied by my underlying OS not supporting static linking properly. :-\
>
> I do see that Libtool is stripping out the "-static" standalone flag that you passed into LDFLAGS. Yuck. What's -Wl,-E? Can you try "-Wl,-static" instead?
>
>
> On Jan 25, 2012, at 1:24 AM, Ilias Miroslav wrote:
>
>> Hello again,
>>
>> I need own static "mpirun" for porting (together with the static executable) onto various (unknown) grid servers. In grid computing one can not expect OpenMPI-ILP64 installtion on each computing element.
>>
>> Jeff: I tried LDFLAGS in configure
>>
>> ilias_at_194.160.135.47:~/bin/ompi-ilp64_full_static/openmpi-1.4.4/../configure --prefix=/home/ilias/bin/ompi-ilp64_full_static -without-memory-manager --without-libnuma --enable-static --disable-shared CXX=g++ CC=gcc F77=gfortran FC=gfortran FFLAGS="-m64 -fdefault-integer-8 -static" FCFLAGS="-m64 -fdefault-integer-8 -static" CFLAGS="-m64 -static" CXXFLAGS="-m64 -static" LDFLAGS="-static -Wl,-E"
>>
>> but still got dynamic, not static "mpirun":
>> ilias_at_194.160.135.47:~/bin/ompi-ilp64_full_static/bin/.ldd ./mpirun
>> linux-vdso.so.1 => (0x00007fff6090c000)
>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fd7277cf000)
>> libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007fd7275b7000)
>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007fd7273b3000)
>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fd727131000)
>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fd726f15000)
>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fd726b90000)
>> /lib64/ld-linux-x86-64.so.2 (0x00007fd7279ef000)
>>
>> Any help please ? config.log is here:
>>
>> https://docs.google.com/open?id=0B8qBHKNhZAipNTNkMzUxZDEtNjJmZi00YzY3LWI4MmYtY2RkZDVkMjhiOTM1
>>
>> Best, Miro
>> ------------------------------
>> Message: 10
>> Date: Tue, 24 Jan 2012 11:55:21 -0500
>> From: Jeff Squyres <jsquyres_at_[hidden]>
>> Subject: Re: [OMPI users] pure static "mpirun" launcher
>> To: Open MPI Users <users_at_[hidden]>
>> Message-ID: <A86D3721-9BF8-4A7D-B745-32E60652101F_at_[hidden]>
>> Content-Type: text/plain; charset=windows-1252
>>
>> Ilias: Have you simply tried building Open MPI with flags that force static linking? E.g., something like this:
>>
>> ./configure --enable-static --disable-shared LDFLAGS=-Wl,-static
>>
>> I.e., put in LDFLAGS whatever flags your compiler/linker needs to force static linking. These LDFLAGS will be applied to all of Open MPI's executables, including mpirun.
>>
>>
>> On Jan 24, 2012, at 10:28 AM, Ralph Castain wrote:
>>
>>> Good point! I'm traveling this week with limited resources, but will try to address when able.
>>>
>>> Sent from my iPad
>>>
>>> On Jan 24, 2012, at 7:07 AM, Reuti <reuti_at_[hidden]> wrote:
>>>
>>>> Am 24.01.2012 um 15:49 schrieb Ralph Castain:
>>>>
>>>>> I'm a little confused. Building procs static makes sense as libraries may not be available on compute nodes. However, mpirun is only executed in one place, usually the head node where it was built. So there is less reason to build it purely static.
>>>>>
>>>>> Are you trying to move mpirun somewhere? Or is it the daemons that mpirun launches that are the real problem?
>>>>
>>>> This depends: if you have a queuing system, the master node of a parallel job may be one of the slave nodes already where the jobscript runs. Nevertheless I have the nodes uniform, but I saw places where it wasn't the case.
>>>>
>>>> An option would be to have a special queue, which will execute the jobscript always on the headnode (i.e. without generating any load) and use only non-local granted slots for mpirun. For this it might be necssary to have a high number of slots on the headnode for this queue, and request always one slot on this machine in addition to the necessary ones on the computing node.
>>>>
>>>> -- Reuti
>>>>
>>>>
>>>>> Sent from my iPad
>>>>>
>>>>> On Jan 24, 2012, at 5:54 AM, Ilias Miroslav <Miroslav.Ilias_at_[hidden]> wrote:
>>>>>
>>>>>> Dear experts,
>>>>>>
>>>>>> following http://www.open-mpi.org/faq/?category=building#static-build I successfully build static OpenMPI library.
>>>>>> Using such prepared library I succeeded in building parallel static executable - dirac.x (ldd dirac.x-not a dynamic executable).
>>>>>>
>>>>>> The problem remains, however, with the mpirun (orterun) launcher.
>>>>>> While on the local machine, where I compiled both static OpenMPI & static dirac.x I am able to launch parallel job
>>>>>> <OpenMPI_static>/mpirun -np 2 dirac.x ,
>>>>>> I can not lauch it elsewhere, because "mpirun" is dynamically linked, thus machine dependent:
>>>>>>
>>>>>> ldd mpirun:
>>>>>> linux-vdso.so.1 => (0x00007fff13792000)
>>>>>> libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f40f8cab000)
>>>>>> libnsl.so.1 => /lib/x86_64-linux-gnu/libnsl.so.1 (0x00007f40f8a93000)
>>>>>> libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f40f888f000)
>>>>>> libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f40f860d000)
>>>>>> libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f40f83f1000)
>>>>>> libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f40f806c000)
>>>>>> /lib64/ld-linux-x86-64.so.2 (0x00007f40f8ecb000)
>>>>>>
>>>>>> Please how to I build "pure" static mpirun launcher, usable (in my case together with static dirac.x) also on other computers ?
>>>>>>
>>>>>> Thanks, Miro
>>>>>>
>>>>>> --
>>>>>> RNDr. Miroslav Ilia?, PhD.
>>>>>>
>>>>>> Katedra ch?mie
>>>>>> Fakulta pr?rodn?ch vied
>>>>>> Univerzita Mateja Bela
>>>>>> Tajovsk?ho 40
>>>>>> 97400 Bansk? Bystrica
>>>>>> tel: +421 48 446 7351
>>>>>> email : Miroslav.Ilias_at_[hidden]
>>>>>>
>>>>>> Department of Chemistry
>>>>>> Faculty of Natural Sciences
>>>>>> Matej Bel University
>>>>>> Tajovsk?ho 40
>>>>>> 97400 Banska Bystrica
>>>>>> Slovakia
>>>>>> tel: +421 48 446 7351
>>>>>> email : Miroslav.Ilias_at_[hidden]
>>>>>>
>>>>>> _______________________________________________
>>>>>> users mailing list
>>>>>> users_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>
>>
>>
>>
>> ------------------------------
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> End of users Digest, Vol 2133, Issue 1
>> **************************************
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquyres_at_[hidden]
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/