Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

From: Tim S. Woodall (twoodall_at_[hidden])
Date: 2005-08-19 08:15:48


Josh,

I believe that although the prior code called ras routines,
they were simple library routines in ras base, that didn't
require ras to be initialized (they just accessed the registry).

So, w/ the new code, both ras/rds components must be initialized/selected.

My opinion would be to add the appropriate interface to the rmgr,
move the code to rmgr/urm, and have rmgr/proxy simply forward the
request to the seed.

Note that the intent of the rmgr was to abstract the services provided
by rds/ras/pls - such that you could potentially drop in a new rmgr
that didn't use any of these.

Thanks,
Tim

Josh Hursey wrote:
> Hey all,
>
> Sorry for my lag on this thread, I'm still settling back into
> Bloomington and catching up on email traffic.
>
> This is certainly my fault WRT the addition of the RDS call to
> orte_init_stage1(). I never tested the case where a process is a
> singleton and not the seed. :(
>
> Since the RAS (or functionality represented by this subsystem) was
> exposed at this level, it was assumed that the RDS is also active at
> this time. The addition in orte_init_stage1 was to add host entries to
> both the RAS and RDS (instead of just the RAS) when we start a
> singleton process.
>
> A quick repair would be to protect the RDS section from all non-seed
> processes. E.g.
> if(orte_process_info.seed) {
> ret = orte_rds.store_resource(&rds_single_host);
> if (ORTE_SUCCESS != ret ) {
> ORTE_ERROR_LOG(ret);
> return ret;
> }
> }
>
> An additional fix would be to add a call to the rmgr to setup singleton
> processes, thus pulling out the 'singleton process only' chunk of code
> from the orte_init_stage1() and into the rmgr. Something like:
>
> if (orte_process_info.singleton) {
> if (ORTE_SUCCESS != (ret =
> orte_rmgr_base_setup_singleton(my_jobid,...))) {
> ORTE_ERROR_LOG(ret);
> return ret;
> }
> }
>
> Currently this would only contain the addition of the singleton process
> to the RDS and RAS, but Ralph mentioned last week that he ran across
> some other 'singleton only' stuff that might be needed.
>
> Is there a design issue in adding this functionality to the rmgr, with
> the proper protection around access to the RDS?
>
> I guess my overall argument is that the RDS should be called in the
> singleton+seed case since we are adding resources to the allocation
> [RAS], and thus the resources globally available [RDS]. Do we assume
> that if the process is a singleton and not the seed then it has already
> been placed in the RDS, and only needs to confirm it allocation in the
> RAS? Shouldn't that registry handling only happen at the seed level if
> we assume it has launched the singleton process?
>
> It is likely that I could have things confused a bit with how we define
> a singleton process, and how they are created with relation to the
> seed.
>
> As a general bug notice in ORTE: There is an outstanding bug in the
> proxy/replica NS components when creating new cellid's that I ran
> across last Friday, before I had to stop. Something is getting mangled
> in the packing of the command sent to the seed. I had to wrap up before
> I could seek a good fix, just enough to characterize the problem.
>
> Thoughts?
>
> Sorry for causing trouble,
>
> Josh
>
> On Aug 18, 2005, at 3:33 PM, Tim S. Woodall wrote:
>
>
>>I'm seeing a problem in orte_init_stage1 when running w/ a persistent
>>daemon.
>>The problem is that the orte_inti call attempts to call rds subsystem
>>directly,
>>which is not supposed to be exposed at that level. rds is used
>>internally by
>>the rmgr - and only initialized on the seed. The proxy rmgr is loaded
>>when
>>a persistent daemon is available - and therefore the rds is not loaded.
>>
>>So... orte_init_stage1 shouldn't be calling rds directly...
>>
>>Tim
>>
>>
>>Brian Barrett wrote:
>>
>>
>>>Yeah, although there really shouldn't be a way for the pointer to be
>>>NULL. Was this a static build? I was seeing some weird memory
>>>issues on static builds last night... I'll take a look on odin and
>>>see what I can find.
>>>
>>>Brian
>>>
>>>On Aug 18, 2005, at 11:18 AM, Tim S. Woodall wrote:
>>>
>>>
>>>
>>>
>>>>Brian,
>>>>
>>>>Wasn't the introduction of sds part of your changes for redstorm?
>>>>Any ideas
>>>>why it would be NULL here?
>>>>
>>>>Thanks,
>>>>Tim
>>>>
>>>>Rainer Keller wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>Hello,
>>>>>see the "same" (well probably not exactly same) thing here in
>>>>>Opteron with
>>>>>64bit (-g and so on), I get:
>>>>>
>>>>>#0 0x0000000040085160 in orte_sds_base_contact_universe ()
>>>>>at ../../../../../orte/mca/sds/base/sds_base_interface.c:29
>>>>>29 return orte_sds_base_module->contact_universe();
>>>>>(gdb) where
>>>>>#0 0x0000000040085160 in orte_sds_base_contact_universe ()
>>>>>at ../../../../../orte/mca/sds/base/sds_base_interface.c:29
>>>>>#1 0x0000000040063e95 in orte_init_stage1 ()
>>>>>at ../../../orte/runtime/orte_init_stage1.c:185
>>>>>#2 0x0000000040017e7d in orte_system_init ()
>>>>>at ../../../orte/runtime/orte_system_init.c:38
>>>>>#3 0x00000000400148f5 in orte_init () at ../../../orte/runtime/
>>>>>orte_init.c:46
>>>>>#4 0x000000004000dfc7 in main (argc=4, argv=0x7fbfffe8a8)
>>>>>at ../../../../orte/tools/orterun/orterun.c:291
>>>>>#5 0x0000002a95c0c017 in __libc_start_main () from /lib64/libc.so.6
>>>>>#6 0x000000004000bf2a in _start ()
>>>>>(gdb)
>>>>>within mpirun
>>>>>
>>>>>orte_sds_base_module here is Null...
>>>>>This is without persistent orted; Just mpirun...
>>>>>
>>>>>CU,
>>>>>ray
>>>>>
>>>>>
>>>>>On Thursday 18 August 2005 16:57, Nathan DeBardeleben wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>FYI, this only happens when I let OMPI compile 64bit on Linux.
>>>>>>When I
>>>>>>throw in there CFLAGS=FFLAGS=CXXFLAGS=-m32 orted, my myriad of test
>>>>>>codes, mpirun, registry subscription codes, and JNI all work like
>>>>>>a champ.
>>>>>>Something's wrong with the 64bit it appears to me.
>>>>>>
>>>>>>-- Nathan
>>>>>>Correspondence
>>>>>>-------------------------------------------------------------------
>>>>>>-
>>>>>>-
>>>>>>Nathan DeBardeleben, Ph.D.
>>>>>>Los Alamos National Laboratory
>>>>>>Parallel Tools Team
>>>>>>High Performance Computing Environments
>>>>>>phone: 505-667-3428
>>>>>>email: ndebard_at_[hidden]
>>>>>>-------------------------------------------------------------------
>>>>>>-
>>>>>>-
>>>>>>
>>>>>>Tim S. Woodall wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>Nathan,
>>>>>>>
>>>>>>>I'll try to reproduce this sometime this week - but I'm pretty
>>>>>>>swamped.
>>>>>>>Is Greg also seeing the same behavior?
>>>>>>>
>>>>>>>Thanks,
>>>>>>>Tim
>>>>>>>
>>>>>>>Nathan DeBardeleben wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>To expand on this further, orte_init() seg faults on both
>>>>>>>>bluesteel
>>>>>>>>(32bit linux) and sparkplug (64bit linux) equally. The required
>>>>>>>>condition is that orted must be running first (which of course we
>>>>>>>>require for our work - a persistent orte daemon and registry).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>[bluesteel]~/ptp > ./dump_info
>>>>>>>>>Segmentation fault
>>>>>>>>>[bluesteel]~/ptp > gdb dump_info
>>>>>>>>>GNU gdb 6.1
>>>>>>>>>Copyright 2004 Free Software Foundation, Inc.
>>>>>>>>>GDB is free software, covered by the GNU General Public
>>>>>>>>>License, and
>>>>>>>>>you are
>>>>>>>>>welcome to change it and/or distribute copies of it under
>>>>>>>>>certain
>>>>>>>>>conditions.
>>>>>>>>>Type "show copying" to see the conditions.
>>>>>>>>>There is absolutely no warranty for GDB. Type "show warranty"
>>>>>>>>>for
>>>>>>>>>details.
>>>>>>>>>This GDB was configured as "x86_64-suse-linux"...Using host
>>>>>>>>>libthread_db library "/lib64/tls/libthread_db.so.1".
>>>>>>>>>
>>>>>>>>>(gdb) run
>>>>>>>>>Starting program: /home/ndebard/ptp/dump_info
>>>>>>>>>
>>>>>>>>>Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>0x0000000000000000 in ?? ()
>>>>>>>>>(gdb) where
>>>>>>>>>#0 0x0000000000000000 in ?? ()
>>>>>>>>>#1 0x000000000045997d in orte_init_stage1 () at
>>>>>>>>>orte_init_stage1.c:419
>>>>>>>>>#2 0x00000000004156a7 in orte_system_init () at
>>>>>>>>>orte_system_init.c:38
>>>>>>>>>#3 0x00000000004151c7 in orte_init () at orte_init.c:46
>>>>>>>>>#4 0x0000000000414cbb in main (argc=1, argv=0x7fbffff298) at
>>>>>>>>>dump_info.c:185
>>>>>>>>>(gdb)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>-- Nathan
>>>>>>>>Correspondence
>>>>>>>>-----------------------------------------------------------------
>>>>>>>>-
>>>>>>>>---
>>>>>>>>Nathan DeBardeleben, Ph.D.
>>>>>>>>Los Alamos National Laboratory
>>>>>>>>Parallel Tools Team
>>>>>>>>High Performance Computing Environments
>>>>>>>>phone: 505-667-3428
>>>>>>>>email: ndebard_at_[hidden]
>>>>>>>>-----------------------------------------------------------------
>>>>>>>>-
>>>>>>>>---
>>>>>>>>
>>>>>>>>Nathan DeBardeleben wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>Just to clarify:
>>>>>>>>>1: no orted started (meaning the MPIrun or registry programs
>>>>>>>>>will
>>>>>>>>>start one by themselves) causes those programs to lock up.
>>>>>>>>>2: starting orted by hand (trying to get these programs to
>>>>>>>>>connect to
>>>>>>>>>a centralized one) causes the connecting programs to seg fault.
>>>>>>>>>
>>>>>>>>>-- Nathan
>>>>>>>>>Correspondence
>>>>>>>>>----------------------------------------------------------------
>>>>>>>>>-
>>>>>>>>>----
>>>>>>>>>Nathan DeBardeleben, Ph.D.
>>>>>>>>>Los Alamos National Laboratory
>>>>>>>>>Parallel Tools Team
>>>>>>>>>High Performance Computing Environments
>>>>>>>>>phone: 505-667-3428
>>>>>>>>>email: ndebard_at_[hidden]
>>>>>>>>>----------------------------------------------------------------
>>>>>>>>>-
>>>>>>>>>----
>>>>>>>>>
>>>>>>>>>Nathan DeBardeleben wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>So I dropped an .ompi_ignore into that directory,
>>>>>>>>>>reconfigured, and
>>>>>>>>>>compile worked (yay!).
>>>>>>>>>>However, not a lot of progress: mpirun locks up, all my
>>>>>>>>>>registry test
>>>>>>>>>>programs lock up as well. If I start the orted by hand, then
>>>>>>>>>>any of my
>>>>>>>>>>
>>>>>>>>>>registry calling programs cause segfault:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>[sparkplug]~/ptp > gdb sub_test
>>>>>>>>>>>GNU gdb 6.1
>>>>>>>>>>>Copyright 2004 Free Software Foundation, Inc.
>>>>>>>>>>>GDB is free software, covered by the GNU General Public
>>>>>>>>>>>License, and
>>>>>>>>>>>you are
>>>>>>>>>>>welcome to change it and/or distribute copies of it under
>>>>>>>>>>>certain
>>>>>>>>>>>conditions.
>>>>>>>>>>>Type "show copying" to see the conditions.
>>>>>>>>>>>There is absolutely no warranty for GDB. Type "show
>>>>>>>>>>>warranty" for
>>>>>>>>>>>details.
>>>>>>>>>>>This GDB was configured as "x86_64-suse-linux"...Using host
>>>>>>>>>>>libthread_db library "/lib64/tls/libthread_db.so.1".
>>>>>>>>>>>
>>>>>>>>>>>(gdb) run
>>>>>>>>>>>Starting program: /home/ndebard/ptp/sub_test
>>>>>>>>>>>
>>>>>>>>>>>Program received signal SIGSEGV, Segmentation fault.
>>>>>>>>>>>0x0000000000000000 in ?? ()
>>>>>>>>>>>(gdb) where
>>>>>>>>>>>#0 0x0000000000000000 in ?? ()
>>>>>>>>>>>#1 0x00000000004598a5 in orte_init_stage1 () at
>>>>>>>>>>>orte_init_stage1.c:419 #2 0x00000000004155cf in
>>>>>>>>>>>orte_system_init ()
>>>>>>>>>>>at orte_system_init.c:38 #3 0x00000000004150ef in orte_init
>>>>>>>>>>>() at
>>>>>>>>>>>orte_init.c:46
>>>>>>>>>>>#4 0x00000000004148a1 in main (argc=1, argv=0x7fbffff178) at
>>>>>>>>>>>sub_test.c:60
>>>>>>>>>>>(gdb)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>Yes, I recompiled everything.
>>>>>>>>>>
>>>>>>>>>>Here's an example of me trying something a little more
>>>>>>>>>>complicated
>>>>>>>>>>(which I believe locks up for the same reason - something
>>>>>>>>>>borked with
>>>>>>>>>>the registry interaction).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>[sparkplug]~/ompi-test > bjssub -s 10000 -n 10 -i bash
>>>>>>>>>>>>Waiting for interactive job nodes.
>>>>>>>>>>>>(nodes 18 16 17 18 19 20 21 22 23 24 25)
>>>>>>>>>>>>Starting interactive job.
>>>>>>>>>>>>NODES=16,17,18,19,20,21,22,23,24,25
>>>>>>>>>>>>JOBID=18
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>so i got my nodes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> export
>>>>>>>>>>>>OMPI_MCA_ptl_base_exclude=sm
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> export
>>>>>>>>>>>>OMPI_MCA_pls_bproc_seed_priority=101
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>and set these envvars like we need to use Greg's bproc,
>>>>>>>>>>>without the
>>>>>>>>>>>2nd export the machine's load maxes and locks up.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> bpstat
>>>>>>>>>>>>Node(s) Status Mode
>>>>>>>>>>>>User Group 100-128 down
>>>>>>>>>>>>---------- root root 0-15
>>>>>>>>>>>>up ---x------ vchandu vchandu
>>>>>>>>>>>>16-25 up ---x------
>>>>>>>>>>>>ndebard ndebard
>>>>>>>>>>>>26-27 up ---x------
>>>>>>>>>>>>root root 28-30 up
>>>>>>>>>>>>---x--x--x root root ndebard_at_sparkplug:~/ompi-test>
>>>>>>>>>>>>env | grep
>>>>>>>>>>>>NODES
>>>>>>>>>>>>NODES=16,17,18,19,20,21,22,23,24,25
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>yes, i really have the nodes
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> mpicc -o test-mpi test-mpi.c
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>recompile for good measure
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> ls /tmp/openmpi-sessions-
>>>>>>>>>>>>ndebard*
>>>>>>>>>>>>/bin/ls: /tmp/openmpi-sessions-ndebard*: No such file or
>>>>>>>>>>>>directory
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>proof that there's no left over old directory
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test> mpirun -np 1 test-mpi
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>it never responds at this point - but I can kill it with ^C.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>mpirun: killing job...
>>>>>>>>>>>>Killed
>>>>>>>>>>>>ndebard_at_sparkplug:~/ompi-test>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>-- Nathan
>>>>>>>>>>Correspondence
>>>>>>>>>>---------------------------------------------------------------
>>>>>>>>>>-
>>>>>>>>>>-----
>>>>>>>>>>Nathan DeBardeleben, Ph.D.
>>>>>>>>>>Los Alamos National Laboratory
>>>>>>>>>>Parallel Tools Team
>>>>>>>>>>High Performance Computing Environments
>>>>>>>>>>phone: 505-667-3428
>>>>>>>>>>email: ndebard_at_[hidden]
>>>>>>>>>>---------------------------------------------------------------
>>>>>>>>>>-
>>>>>>>>>>-----
>>>>>>>>>>
>>>>>>>>>>Jeff Squyres wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>Is this what Tim Prins was working on?
>>>>>>>>>>>
>>>>>>>>>>>On Aug 16, 2005, at 5:21 PM, Tim S. Woodall wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>I'm not sure why this is even building... Is someone
>>>>>>>>>>>>working on this?
>>>>>>>>>>>>I thought we had .ompi_ignore files in this directory.
>>>>>>>>>>>>
>>>>>>>>>>>>Tim
>>>>>>>>>>>>
>>>>>>>>>>>>Nathan DeBardeleben wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>So I'm seeing all these nice emails about people
>>>>>>>>>>>>>developing on OMPI
>>>>>>>>>>>>>today yet I can't get it to compile. Am I out here in
>>>>>>>>>>>>>limbo on this
>>>>>>>>>>>>>or
>>>>>>>>>>>>>are others in the same boat? The errors I'm seeing are
>>>>>>>>>>>>>about some
>>>>>>>>>>>>>bproc
>>>>>>>>>>>>>code calling undefined functions and they are linked again
>>>>>>>>>>>>>below.
>>>>>>>>>>>>>
>>>>>>>>>>>>>-- Nathan
>>>>>>>>>>>>>Correspondence
>>>>>>>>>>>>>------------------------------------------------------------
>>>>>>>>>>>>>-
>>>>>>>>>>>>>-------
>>>>>>>>>>>>>- Nathan DeBardeleben, Ph.D.
>>>>>>>>>>>>>Los Alamos National Laboratory
>>>>>>>>>>>>>Parallel Tools Team
>>>>>>>>>>>>>High Performance Computing Environments
>>>>>>>>>>>>>phone: 505-667-3428
>>>>>>>>>>>>>email: ndebard_at_[hidden]
>>>>>>>>>>>>>------------------------------------------------------------
>>>>>>>>>>>>>-
>>>>>>>>>>>>>-------
>>>>>>>>>>>>>-
>>>>>>>>>>>>>
>>>>>>>>>>>>>Nathan DeBardeleben wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>>Back from training and trying to test this but now OMPI
>>>>>>>>>>>>>>doesn't
>>>>>>>>>>>>>>compile
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>at all:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>gcc -DHAVE_CONFIG_H -I. -I. -I../../../../include
>>>>>>>>>>>>>>>-I../../../../include -I../../../.. -I../../../..
>>>>>>>>>>>>>>>-I../../../../include -I../../../../opal
>>>>>>>>>>>>>>>-I../../../../orte
>>>>>>>>>>>>>>>-I../../../../ompi -g -Wall -Wundef -Wno-long-long -
>>>>>>>>>>>>>>>Wsign-compare
>>>>>>>>>>>>>>>-Wmissing-prototypes -Wstrict-prototypes -Wcomment -
>>>>>>>>>>>>>>>pedantic
>>>>>>>>>>>>>>>-Werror-implicit-function-declaration -fno-strict-
>>>>>>>>>>>>>>>aliasing -MT
>>>>>>>>>>>>>>>ras_lsf_bproc.lo -MD -MP -MF .deps/ras_lsf_bproc.Tpo -c
>>>>>>>>>>>>>>>ras_lsf_bproc.c -o ras_lsf_bproc.o
>>>>>>>>>>>>>>>ras_lsf_bproc.c: In function
>>>>>>>>>>>>>>>`orte_ras_lsf_bproc_node_insert':
>>>>>>>>>>>>>>>ras_lsf_bproc.c:32: error: implicit declaration of
>>>>>>>>>>>>>>>function
>>>>>>>>>>>>>>>`orte_ras_base_node_insert'
>>>>>>>>>>>>>>>ras_lsf_bproc.c: In function
>>>>>>>>>>>>>>>`orte_ras_lsf_bproc_node_query':
>>>>>>>>>>>>>>>ras_lsf_bproc.c:37: error: implicit declaration of
>>>>>>>>>>>>>>>function
>>>>>>>>>>>>>>>`orte_ras_base_node_query'
>>>>>>>>>>>>>>>make[4]: *** [ras_lsf_bproc.lo] Error 1
>>>>>>>>>>>>>>>make[4]: Leaving directory
>>>>>>>>>>>>>>>`/home/ndebard/ompi/orte/mca/ras/lsf_bproc'
>>>>>>>>>>>>>>>make[3]: *** [all-recursive] Error 1
>>>>>>>>>>>>>>>make[3]: Leaving directory `/home/ndebard/ompi/orte/mca/
>>>>>>>>>>>>>>>ras'
>>>>>>>>>>>>>>>make[2]: *** [all-recursive] Error 1
>>>>>>>>>>>>>>>make[2]: Leaving directory `/home/ndebard/ompi/orte/mca'
>>>>>>>>>>>>>>>make[1]: *** [all-recursive] Error 1
>>>>>>>>>>>>>>>make[1]: Leaving directory `/home/ndebard/ompi/orte'
>>>>>>>>>>>>>>>make: *** [all-recursive] Error 1
>>>>>>>>>>>>>>>[sparkplug]~/ompi >
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Clean SVN checkout this morning with configure:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>[sparkplug]~/ompi > ./configure --enable-static --
>>>>>>>>>>>>>>>disable-shared
>>>>>>>>>>>>>>>--without-threads --prefix=/home/ndebard/local/ompi
>>>>>>>>>>>>>>>--with-devel-headers
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>-- Nathan
>>>>>>>>>>>>>>Correspondence
>>>>>>>>>>>>>>-----------------------------------------------------------
>>>>>>>>>>>>>>-
>>>>>>>>>>>>>>-------
>>>>>>>>>>>>>>-- Nathan DeBardeleben, Ph.D.
>>>>>>>>>>>>>>Los Alamos National Laboratory
>>>>>>>>>>>>>>Parallel Tools Team
>>>>>>>>>>>>>>High Performance Computing Environments
>>>>>>>>>>>>>>phone: 505-667-3428
>>>>>>>>>>>>>>email: ndebard_at_[hidden]
>>>>>>>>>>>>>>-----------------------------------------------------------
>>>>>>>>>>>>>>-
>>>>>>>>>>>>>>-------
>>>>>>>>>>>>>>--
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>Brian Barrett wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>This is now fixed in SVN. You should no longer need the
>>>>>>>>>>>>>>>--build=i586... hack to compile 32 bit code on Opterons.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>Brian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>On Aug 12, 2005, at 3:17 PM, Brian Barrett wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>On Aug 12, 2005, at 3:13 PM, Nathan DeBardeleben wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>We've got a 64bit Linux (SUSE) box here. For a
>>>>>>>>>>>>>>>>>variety of
>>>>>>>>>>>>>>>>>reasons (Java, JNI, linking in with OMPI libraries,
>>>>>>>>>>>>>>>>>etc which I
>>>>>>>>>>>>>>>>>won't get into)
>>>>>>>>>>>>>>>>>I need to compile OMPI 32 bit (or get 64bit versions
>>>>>>>>>>>>>>>>>of a lot of
>>>>>>>>>>>>>>>>>other
>>>>>>>>>>>>>>>>>libraries).
>>>>>>>>>>>>>>>>>I get various compile errors when I try different
>>>>>>>>>>>>>>>>>things, but
>>>>>>>>>>>>>>>>>first
>>>>>>>>>>>>>>>>>let
>>>>>>>>>>>>>>>>>me explain the system we have:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>><snip>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>This goes on and on and on actually. And the 'is
>>>>>>>>>>>>>>>>>incompatible
>>>>>>>>>>>>>>>>>with
>>>>>>>>>>>>>>>>>i386:x86-64 output' looks to be repeated for every
>>>>>>>>>>>>>>>>>line before
>>>>>>>>>>>>>>>>>this
>>>>>>>>>>>>>>>>>error which actually caused the Make to bomb.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>Any suggestions at all? Surely someone must have
>>>>>>>>>>>>>>>>>tried to force
>>>>>>>>>>>>>>>>>OMPI to
>>>>>>>>>>>>>>>>>build in 32bit mode on a 64bit machine.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>I don't think anyone has tried to build 32 bit on an
>>>>>>>>>>>>>>>>Opteron,
>>>>>>>>>>>>>>>>which is the cause of the problems...
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>I think I know how to fix this, but won't happen until
>>>>>>>>>>>>>>>>later in
>>>>>>>>>>>>>>>>the weekend. I can't think of a good workaround until
>>>>>>>>>>>>>>>>then.
>>>>>>>>>>>>>>>>Well, one possibility is to set the target like you
>>>>>>>>>>>>>>>>were doing
>>>>>>>>>>>>>>>>and disable ROMIO. Actually, you'll also need to disable
>>>>>>>>>>>>>>>>Fortran 77. So something like:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>./configure [usual options] --build=i586-suse-linux --
>>>>>>>>>>>>>>>>disable-io-
>>>>>>>>>>>>>>>>romio --disable-f77
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>might just do the trick.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>Brian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>--
>>>>>>>>>>>>>>>>Brian Barrett
>>>>>>>>>>>>>>>>Open MPI developer
>>>>>>>>>>>>>>>>http://www.open-mpi.org/
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.open-mpi.org/
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>