Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Nathan DeBardeleben (ndebard_at_[hidden])
Date: 2005-08-17 09:59:09


So I dropped an .ompi_ignore into that directory, reconfigured, and
compile worked (yay!).
However, not a lot of progress: mpirun locks up, all my registry test
programs lock up as well. If I start the orted by hand, then any of my
registry calling programs cause segfault:

> [sparkplug]~/ptp > gdb sub_test
> GNU gdb 6.1
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and
> you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB. Type "show warranty" for
> details.
> This GDB was configured as "x86_64-suse-linux"...Using host
> libthread_db library "/lib64/tls/libthread_db.so.1".
>
> (gdb) run
> Starting program: /home/ndebard/ptp/sub_test
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000000000 in ?? ()
> (gdb) where
> #0 0x0000000000000000 in ?? ()
> #1 0x00000000004598a5 in orte_init_stage1 () at orte_init_stage1.c:419
> #2 0x00000000004155cf in orte_system_init () at orte_system_init.c:38
> #3 0x00000000004150ef in orte_init () at orte_init.c:46
> #4 0x00000000004148a1 in main (argc=1, argv=0x7fbffff178) at
> sub_test.c:60
> (gdb)

Yes, I recompiled everything.

Here's an example of me trying something a little more complicated
(which I believe locks up for the same reason - something borked with
the registry interaction).

>> [sparkplug]~/ompi-test > bjssub -s 10000 -n 10 -i bash
>> Waiting for interactive job nodes.
>> (nodes 18 16 17 18 19 20 21 22 23 24 25)
>> Starting interactive job.
>> NODES=16,17,18,19,20,21,22,23,24,25
>> JOBID=18
>
> so i got my nodes
>
>> ndebard_at_sparkplug:~/ompi-test> export OMPI_MCA_ptl_base_exclude=sm
>> ndebard_at_sparkplug:~/ompi-test> export
>> OMPI_MCA_pls_bproc_seed_priority=101
>
> and set these envvars like we need to use Greg's bproc, without the
> 2nd export the machine's load maxes and locks up.
>
>> ndebard_at_sparkplug:~/ompi-test> bpstat
>> Node(s) Status Mode
>> User Group 100-128 down
>> ---------- root root 0-15
>> up ---x------ vchandu vchandu
>> 16-25 up ---x------
>> ndebard ndebard
>> 26-27 up ---x------
>> root root 28-30 up
>> ---x--x--x root root ndebard_at_sparkplug:~/ompi-test> env | grep
>> NODES
>> NODES=16,17,18,19,20,21,22,23,24,25
>
> yes, i really have the nodes
>
>> ndebard_at_sparkplug:~/ompi-test> mpicc -o test-mpi test-mpi.c
>> ndebard_at_sparkplug:~/ompi-test>
>
> recompile for good measure
>
>> ndebard_at_sparkplug:~/ompi-test> ls /tmp/openmpi-sessions-ndebard*
>> /bin/ls: /tmp/openmpi-sessions-ndebard*: No such file or directory
>
> proof that there's no left over old directory
>
>> ndebard_at_sparkplug:~/ompi-test> mpirun -np 1 test-mpi
>
> it never responds at this point - but I can kill it with ^C.
>
>> mpirun: killing job...
>> Killed
>> ndebard_at_sparkplug:~/ompi-test>
>

-- Nathan
Correspondence
---------------------------------------------------------------------
Nathan DeBardeleben, Ph.D.
Los Alamos National Laboratory
Parallel Tools Team
High Performance Computing Environments
phone: 505-667-3428
email: ndebard_at_[hidden]
---------------------------------------------------------------------

Jeff Squyres wrote:

>Is this what Tim Prins was working on?
>
>
>On Aug 16, 2005, at 5:21 PM, Tim S. Woodall wrote:
>
>
>
>>I'm not sure why this is even building... Is someone working on this?
>>I thought we had .ompi_ignore files in this directory.
>>
>>Tim
>>
>>
>>Nathan DeBardeleben wrote:
>>
>>
>>>So I'm seeing all these nice emails about people developing on OMPI
>>>today yet I can't get it to compile. Am I out here in limbo on this
>>>or
>>>are others in the same boat? The errors I'm seeing are about some
>>>bproc
>>>code calling undefined functions and they are linked again below.
>>>
>>>-- Nathan
>>>Correspondence
>>>---------------------------------------------------------------------
>>>Nathan DeBardeleben, Ph.D.
>>>Los Alamos National Laboratory
>>>Parallel Tools Team
>>>High Performance Computing Environments
>>>phone: 505-667-3428
>>>email: ndebard_at_[hidden]
>>>---------------------------------------------------------------------
>>>
>>>
>>>
>>>Nathan DeBardeleben wrote:
>>>
>>>
>>>
>>>
>>>>Back from training and trying to test this but now OMPI doesn't
>>>>compile
>>>>at all:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include
>>>>>-I../../../../include -I../../../.. -I../../../..
>>>>>-I../../../../include -I../../../../opal -I../../../../orte
>>>>>-I../../../../ompi -g -Wall -Wundef -Wno-long-long -Wsign-compare
>>>>>-Wmissing-prototypes -Wstrict-prototypes -Wcomment -pedantic
>>>>>-Werror-implicit-function-declaration -fno-strict-aliasing -MT
>>>>>ras_lsf_bproc.lo -MD -MP -MF .deps/ras_lsf_bproc.Tpo -c
>>>>>ras_lsf_bproc.c -o ras_lsf_bproc.o
>>>>>ras_lsf_bproc.c: In function `orte_ras_lsf_bproc_node_insert':
>>>>>ras_lsf_bproc.c:32: error: implicit declaration of function
>>>>>`orte_ras_base_node_insert'
>>>>>ras_lsf_bproc.c: In function `orte_ras_lsf_bproc_node_query':
>>>>>ras_lsf_bproc.c:37: error: implicit declaration of function
>>>>>`orte_ras_base_node_query'
>>>>>make[4]: *** [ras_lsf_bproc.lo] Error 1
>>>>>make[4]: Leaving directory
>>>>>`/home/ndebard/ompi/orte/mca/ras/lsf_bproc'
>>>>>make[3]: *** [all-recursive] Error 1
>>>>>make[3]: Leaving directory `/home/ndebard/ompi/orte/mca/ras'
>>>>>make[2]: *** [all-recursive] Error 1
>>>>>make[2]: Leaving directory `/home/ndebard/ompi/orte/mca'
>>>>>make[1]: *** [all-recursive] Error 1
>>>>>make[1]: Leaving directory `/home/ndebard/ompi/orte'
>>>>>make: *** [all-recursive] Error 1
>>>>>[sparkplug]~/ompi >
>>>>>
>>>>>
>>>>>
>>>>>
>>>>Clean SVN checkout this morning with configure:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>[sparkplug]~/ompi > ./configure --enable-static --disable-shared
>>>>>--without-threads --prefix=/home/ndebard/local/ompi
>>>>>--with-devel-headers
>>>>>
>>>>>
>>>>>
>>>>>
>>>>-- Nathan
>>>>Correspondence
>>>>---------------------------------------------------------------------
>>>>Nathan DeBardeleben, Ph.D.
>>>>Los Alamos National Laboratory
>>>>Parallel Tools Team
>>>>High Performance Computing Environments
>>>>phone: 505-667-3428
>>>>email: ndebard_at_[hidden]
>>>>---------------------------------------------------------------------
>>>>
>>>>
>>>>
>>>>Brian Barrett wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>This is now fixed in SVN. You should no longer need the
>>>>>--build=i586... hack to compile 32 bit code on Opterons.
>>>>>
>>>>>Brian
>>>>>
>>>>>On Aug 12, 2005, at 3:17 PM, Brian Barrett wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>On Aug 12, 2005, at 3:13 PM, Nathan DeBardeleben wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>We've got a 64bit Linux (SUSE) box here. For a variety of reasons
>>>>>>>(Java, JNI, linking in with OMPI libraries, etc which I won't get
>>>>>>>into)
>>>>>>>I need to compile OMPI 32 bit (or get 64bit versions of a lot of
>>>>>>>other
>>>>>>>libraries).
>>>>>>>I get various compile errors when I try different things, but
>>>>>>>first
>>>>>>>let
>>>>>>>me explain the system we have:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>><snip>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>This goes on and on and on actually. And the 'is incompatible
>>>>>>>with
>>>>>>>i386:x86-64 output' looks to be repeated for every line before
>>>>>>>this
>>>>>>>error which actually caused the Make to bomb.
>>>>>>>
>>>>>>>Any suggestions at all? Surely someone must have tried to force
>>>>>>>OMPI to
>>>>>>>build in 32bit mode on a 64bit machine.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>I don't think anyone has tried to build 32 bit on an Opteron, which
>>>>>>is the cause of the problems...
>>>>>>
>>>>>>I think I know how to fix this, but won't happen until later in the
>>>>>>weekend. I can't think of a good workaround until then. Well, one
>>>>>>possibility is to set the target like you were doing and disable
>>>>>>ROMIO. Actually, you'll also need to disable Fortran 77. So
>>>>>>something like:
>>>>>>
>>>>>>./configure [usual options] --build=i586-suse-linux --disable-io-
>>>>>>romio --disable-f77
>>>>>>
>>>>>>might just do the trick.
>>>>>>
>>>>>>Brian
>>>>>>
>>>>>>
>>>>>>--
>>>>>>Brian Barrett
>>>>>>Open MPI developer
>>>>>>http://www.open-mpi.org/
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>devel mailing list
>>>>>>devel_at_[hidden]
>>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>_______________________________________________
>>>>devel mailing list
>>>>devel_at_[hidden]
>>>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>_______________________________________________
>>>devel mailing list
>>>devel_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>>
>>_______________________________________________
>>devel mailing list
>>devel_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>>
>
>
>