Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Migrate OpenMPI to the VxWorks
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-06-07 08:42:31


Sure - just configure with --enable-mca-no-build=filem-rsh,ess-singleton

That will avoid building either of those.

On Jun 6, 2010, at 9:46 PM, 张晶 wrote:

> I find the calls to fork/exec in the orte/mca/ess/singleton and
> orte/mca/filem/rsh. Since the rsh is the only componentfor the filem,
> I wonder I can also omit the orte/mca/filem/rsh?
>
> 2010/6/4 Ralph Castain <rhc_at_[hidden]>:
>> Jeff is correct - create an orte/odls/vxworks and do whatever you need for
>> that platform to launch a local child process.
>>
>> I believe you will also find calls to fork/exec in the
>> orte/mca/ess/singleton area. You may want to add a configure.m4 to that
>> component to tell it not to build for vxworks.
>>
>>
>> 2010/6/4 Jeff Squyres <jsquyres_at_[hidden]>
>>>
>>> Maybe gettimeofday() be replaced with opal_gettimeofday(), which could do
>>> the Right Thing on different platforms...?
>>>
>>> Also, for fork/exec, I think that should be mostly limited to
>>> orte/odls/default, right? If so, perhaps the right thing to do is to clone
>>> that plugin and adapt it for you platform.
>>>
>>>
>>> On Jun 4, 2010, at 1:43 AM, 张晶 wrote:
>>>
>>>> Hi Castain ,
>>>>
>>>> Your last mail to me is really helpful . I met most of the issues
>>>> listed and fixed them as the off-list solution or mine .
>>>> Also as the openmpi code changed there are some other issues (almost
>>>> the missing function ) that are not reported .For example , the
>>>> gettimeofday posix function is not implemented by vxworks library ,I
>>>> just wrote a small library for those function. Until now I have
>>>> successfully compiled the libopen-rte.a and libopen-pal.a , but now
>>>> I stuck
>>>> at the problem of fork and exec ,which is not available in the
>>>> vxworks. It is not possible to implement the fork and exec by myself.I
>>>> have to read through the code using the fork ,then substitute them
>>>> with rtpspawn() . It is a challenging work.I really want to know how
>>>> Brian Barrett deals with the fork() and exec() .
>>>>
>>>> Thanks
>>>>
>>>> Jing
>>>>
>>>> 2010/3/18 Ralph Castain <rhc_at_[hidden]>:
>>>>> Hi Jing
>>>>> Someone else took a look at this off-list a few years ago. It was
>>>>> mostly a
>>>>> problem with the build system (some flags are different) and header
>>>>> file
>>>>> names. I don't believe the port was ever completed though.
>>>>> I have appended the results of that conversation - the last message
>>>>> contained a list of the issues. You would need to update that to the
>>>>> trunk
>>>>> of course as the code has changed considerably since that discussion
>>>>> took
>>>>> place. Brian Barrett subsequently created a first-cut at fixing some
>>>>> of
>>>>> these, but that appears to have been lost in the years since it was
>>>>> done -
>>>>> and wouldn't really be current anyway.
>>>>> I would be happy to assist as I can.
>>>>> Ralph
>>>>>
>>>>> 1. configure issues with "checking prefix for global symbol labels"
>>>>>
>>>>> 1a. VxWorks assembler (CCAS=asppc) generates a.out by default (vs.
>>>>>
>>>>> conftest.o that we need subsequently)
>>>>>
>>>>> there is this fragment to determine the way to assemble conftest.s:
>>>>>
>>>>> if test "$CC" = "$CCAS" ; then
>>>>>
>>>>> ompi_assemble="$CCAS $CCASFLAGS -c conftest.s >conftest.out 2>&1"
>>>>>
>>>>> else
>>>>>
>>>>> ompi_assemble="$CCAS $CCASFLAGS conftest.s >conftest.out 2>&1"
>>>>>
>>>>> fi
>>>>>
>>>>> The subsequent link fails because conftest.o does not exist:
>>>>>
>>>>> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o
>>>>> conftest >
>>>>> conftest.link 2>&1"
>>>>>
>>>>> To work around the problem, I did not set CCAS. This gives me the
>>>>> first
>>>>>
>>>>> invocation that includes the -c argument to CC=ccppc, generating
>>>>>
>>>>> conftest.o output.
>>>>>
>>>>>
>>>>> 1b. linker fails because LDFLAGS are not passed
>>>>>
>>>>> The same linker command line caused problems because $CFLAGS were
>>>>> passed
>>>>>
>>>>> to the linker
>>>>>
>>>>> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o
>>>>> conftest >
>>>>> conftest.link 2>&1"
>>>>>
>>>>> In my environment, I set CC/CFLAGS/LDFLAGS as follows:
>>>>>
>>>>> CC=ccppc
>>>>>
>>>>> CFLAGS=-ggdb3 -std=c99 -pedantic -mrtp -msoft-float -mstrict-align
>>>>>
>>>>> -mregnames -fno-builtin -fexceptions'
>>>>>
>>>>> LDFLAGS=-mrtp -msoft-float -Wl,--start-group -Wl,--end-group
>>>>>
>>>>>
>>>>> -L/amd/raptor/root/opt/WindRiver/vxworks-6.3/target/usr/lib/ppc/PPC32/sfcommon
>>>>>
>>>>> The linker flags are not passed because the ompi_link
>>>>>
>>>>> [xp-kcain1:build_vxworks] ccppc -ggdb3 -std=c99 -pedantic -mrtp
>>>>>
>>>>> -msoft-float -mstrict-align -mregnames -fno-builtin -fexceptions -o
>>>>>
>>>>> hello hello.c
>>>>>
>>>>>
>>>>> /amd/raptor/root/opt/WindRiver/gnu/3.4.4-vxworks-6.3/x86-linux2/bin/../lib/gcc/powerpc-wrs-vxworks/3.4.4/../../../../powerpc-wrs-vxworks/bin/ld:
>>>>>
>>>>>
>>>>> cannot find -lc_internal
>>>>>
>>>>> collect2: ld returned 1 exit status
>>>>>
>>>>>
>>>>> 2. OPAL atomics asm.c:
>>>>>
>>>>> int versus int32_t (refer to email with Brian Barrett
>>>>>
>>>>> 3. OPAL event/event.c: sys/time.h and timercmp() macros not defined by
>>>>>
>>>>> VxWorks
>>>>>
>>>>> refer to workaround in event.c using #ifdef MCS_VXWORKS
>>>>>
>>>>> 4. OPAL event/event.c: pipe() syscall not found
>>>>>
>>>>> workaround:
>>>>>
>>>>> #ifdef HAVE_UNISTD_H
>>>>>
>>>>> #include <unistd.h>
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <ioLib.h> /* for pipe() */
>>>>>
>>>>> #endif
>>>>>
>>>>> #endif
>>>>>
>>>>> 5. OPAL event/signal.c
>>>>>
>>>>> static sig_atomic_t opal_evsigcaught[NSIG];
>>>>>
>>>>> NSIG is not defined
>>>>>
>>>>> but _NSIGS is
>>>>>
>>>>> In Linux, NSIG is defined with -D__USE_MISC
>>>>>
>>>>> So I added this code fragment to signal.c:
>>>>>
>>>>> /* VxWorks signal.h defines _NSIGS, not NSIG */
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #define NSIG (_NSIGS+1)
>>>>>
>>>>> #endif
>>>>>
>>>>>
>>>>> 6. OPAL event/signal.c: no socketpair()
>>>>>
>>>>> workaround: use pipe():
>>>>>
>>>>> #ifdef HAVE_UNISTD_H
>>>>>
>>>>> #include <unistd.h>
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <ioLib.h> /* for pipe() */
>>>>>
>>>>> #endif
>>>>>
>>>>> #endif
>>>>>
>>>>> and later in void opal_evsignal_init(sigset_t *evsigmask)
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> if (pipe(ev_signal_pair) == -1)
>>>>>
>>>>> event_err(1, "%s: pipe", __func__);
>>>>>
>>>>> #else
>>>>>
>>>>> if (socketpair(AF_UNIX, SOCK_STREAM, 0, ev_signal_pair) == -1)
>>>>>
>>>>> event_err(1, "%s: socketpair", __func__);
>>>>>
>>>>> #endif
>>>>>
>>>>> 7. OPAL util/basename.c: #if HAVE_DIRNAME problem
>>>>>
>>>>> ../../../opal/util/basename.c:23:5: warning: "HAVE_DIRNAME" is not
>>>>> defined
>>>>>
>>>>> ../../../opal/util/basename.c: In function `opal_dirname':
>>>>>
>>>>> problem: HAVE_DIRNAME is not defined in opal_config.h so the #if
>>>>>
>>>>> HAVE_DIRNAME will fail at preprocessor/compile time
>>>>>
>>>>> workaround:
>>>>>
>>>>> change #if HAVE_DIRNAME to #if defined(HAVE_DIRNAME)
>>>>>
>>>>>
>>>>> 8. OPAL util/basename.c: strncopy_s and _strdup
>>>>>
>>>>> ../../../opal/util/basename.c: In function `opal_dirname':
>>>>>
>>>>> ../../../opal/util/basename.c:153: error: implicit declaration of
>>>>>
>>>>> function `strncpy_s'
>>>>>
>>>>> ../../../opal/util/basename.c:160: error: implicit declaration of
>>>>>
>>>>> function `_strdup'
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> strncpy( ret, filename, p - filename);
>>>>>
>>>>> #else
>>>>>
>>>>> strncpy_s( ret, (p - filename + 1), filename, p -
>>>>> filename );
>>>>>
>>>>> #endif
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> return strdup(".");
>>>>>
>>>>> #else
>>>>>
>>>>> return _strdup(".");
>>>>>
>>>>> #endif
>>>>>
>>>>>
>>>>>
>>>>> 9. opal/util/if.c: socket() prototype not found in vxworks headers
>>>>>
>>>>> #ifdef HAVE_SYS_SOCKET_H
>>>>>
>>>>> #include <sys/socket.h>
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <sockLib.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> #endif
>>>>>
>>>>> 10. opal/util/if.c: ioctl()
>>>>>
>>>>> #ifdef HAVE_SYS_IOCTL_H
>>>>>
>>>>> #include <sys/ioctl.h>
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <ioLib.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> #endif
>>>>>
>>>>> 11. opal/util/os_path.c: MAXPATHLEN change to PATH_MAX
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> if (total_length > PATH_MAX) { /* path length is too long - reject
>>>>>
>>>>> it */
>>>>>
>>>>> return(NULL);
>>>>>
>>>>> #else
>>>>>
>>>>> if (total_length > MAXPATHLEN) { /* path length is too long -
>>>>>
>>>>> reject it */
>>>>>
>>>>> return(NULL);
>>>>>
>>>>> #endif
>>>>>
>>>>>
>>>>> 12. opal/util/output.c: gethostname()
>>>>>
>>>>> include <hostLib.h>
>>>>>
>>>>> 13. opal/util/output.c: MAXPATHLEN
>>>>>
>>>>> same fix as os_path.c above
>>>>>
>>>>> 14. opal/util/output.c: closelog/openlog/syslog
>>>>>
>>>>> manually turned off HAVE_SYSLOG_H in opal_config.h
>>>>>
>>>>> then got a patch from Jeff Squyres that avoids syslog
>>>>>
>>>>> 15. opal/util/opal_pty.c
>>>>>
>>>>> complains about mismatched prototype of opal_openpty() between this
>>>>>
>>>>> source file and opal_pty.h
>>>>>
>>>>> workaround: manually edit
>>>>> build_vxworks_ppc/opal/include/opal_config.h,
>>>>>
>>>>> use the following line (change 1 to 0):
>>>>>
>>>>> #define OMPI_ENABLE_PTY_SUPPORT 0
>>>>>
>>>>> 16. opal/util/stacktrace.c
>>>>>
>>>>> FPE_FLTINV not present in signal.h
>>>>>
>>>>> workaround: edit opal_config.h to turn off
>>>>>
>>>>> OMPI_WANT_PRETTY_PRINT_STACKTRACE (this can be explicitly configured
>>>>> out
>>>>>
>>>>> but I don't want to reconfigure because I hacked #15 above)
>>>>>
>>>>> 17. opal/mca/base/mca_base_open.c
>>>>>
>>>>> gethostname() -- same as opal/util/output.c, must include hostLib.h
>>>>>
>>>>> 18. opal_progress.c
>>>>>
>>>>> from opal/event/event.h (that I modified earlier)
>>>>>
>>>>> cannot find #include <sys/_timeradd.h>
>>>>>
>>>>> It is in opal/event/compat/sys
>>>>>
>>>>> workaround: change event.h to include the definitions that are present
>>>>>
>>>>> in _timeradd.h instead of including it.
>>>>>
>>>>> 19. Link errors for opal_wrapper
>>>>>
>>>>> strcasecmp
>>>>>
>>>>> strncasecmp
>>>>>
>>>>> I rolled my own in mca_base_open.c (temporary fix, since we may come
>>>>> across
>>>>> this problem elsewhere in the code).
>>>>>
>>>>> 20. dss_internal.h uses a type 'uint'
>>>>>
>>>>> Not sure if it's depending on something in the headers, or something
>>>>> it
>>>>>
>>>>> defined on its own.
>>>>>
>>>>> I changed it to be just like the header I found somewhere under Linux
>>>>> /usr/include:
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> typedef unsigned int uint;
>>>>>
>>>>> #endif
>>>>>
>>>>> 21. struct iovec definition needed
>>>>>
>>>>> orte/mca/iof/base/iof_base_fragment.h:45: warning: array type has
>>>>>
>>>>> incomplete element type
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <net/uio.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> not sure if this is right, or if I should include something like
>>>>>
>>>>> <netBufLib.h> or <ioLib.h>
>>>>>
>>>>>
>>>>> 22. iof_base_setup.c
>>>>>
>>>>> struct termios not understood
>>>>>
>>>>> can only find termios.h header in 'diab' area and I'm not using that
>>>>>
>>>>> compiler.
>>>>>
>>>>> a variable usepty is set to 0 already when OMPI_ENABLE_PTY_SUPPORT is
>>>>> 0.
>>>>>
>>>>> So, why are we compiling this fragment of code at all? I hacked the
>>>>> file
>>>>>
>>>>> so that the struct termios code will not get compiled.
>>>>>
>>>>> 23. oob_base_send/recv.c, oob_base_send/recv_nb.c. struct iovec not
>>>>> known.
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <net/uio.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> 24. orte/mca/rmgr/base/rmgr_base_check_context.c:58: error:
>>>>>
>>>>> `MAXHOSTNAMELEN' undeclared (first use in this function)
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #define MAXHOSTNAMELEN 64
>>>>>
>>>>> #endif
>>>>>
>>>>> 25. orte/mca/rmgr/base/rmgr_base_check_context.c:58:
>>>>>
>>>>> gethostname()
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <hostLib.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> 26. orte/mca/iof/proxy/iof_proxy.h:135: warning: array type has
>>>>>
>>>>> incomplete element type
>>>>>
>>>>> ../../../../../orte/mca/iof/proxy/iof_proxy.h:135: error: field
>>>>>
>>>>> `proxy_iov' has incomplete type
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <net/uio.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> 27. /orte/mca/iof/svc/iof_svc.h:147: warning: array type has
>>>>> incomplete
>>>>>
>>>>> element type
>>>>>
>>>>> ../../../../../orte/mca/iof/svc/iof_svc.h:147: error: field `svc_iov'
>>>>>
>>>>> has incomplete type
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <net/uio.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> 28. ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: warning: array
>>>>>
>>>>> type has incomplete element type
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: error: field
>>>>> `msg_iov'
>>>>>
>>>>> has incomplete type
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h: In function
>>>>>
>>>>> `mca_oob_tcp_msg_iov_alloc':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:196: error: invalid
>>>>>
>>>>> application of `sizeof' to incomplete type `iovec'
>>>>>
>>>>>
>>>>> 29. ../../../../../orte/mca/oob/tcp/oob_tcp.c:344: error: implicit
>>>>>
>>>>> declaration of function `accept'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>>>>>
>>>>> `mca_oob_tcp_create_listen':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:383: error: implicit
>>>>>
>>>>> declaration of function `socket'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:399: error: implicit
>>>>>
>>>>> declaration of function `bind'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:407: error: implicit
>>>>>
>>>>> declaration of function `getsockname'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:415: error: implicit
>>>>>
>>>>> declaration of function `listen'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>>>>>
>>>>> `mca_oob_tcp_listen_thread':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:459: error: implicit
>>>>>
>>>>> declaration of function `bzero'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>>>>>
>>>>> `mca_oob_tcp_recv_probe':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:696: error: implicit
>>>>>
>>>>> declaration of function `send'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>>>>>
>>>>> `mca_oob_tcp_recv_handler':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:795: error: implicit
>>>>>
>>>>> declaration of function `recv'
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>>>>> `mca_oob_tcp_init':
>>>>>
>>>>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:1087: error: implicit
>>>>>
>>>>> declaration of function `usleep'
>>>>>
>>>>> This gets rid of most (except bzero and usleep)
>>>>>
>>>>> #ifdef MCS_VXWORKS
>>>>>
>>>>> #include <sockLib.h>
>>>>>
>>>>> #endif
>>>>>
>>>>> Trying to reconfigure the package so CFLAGS will not include
>>>>> -pedantic.
>>>>>
>>>>> This is because $WIND_HOME/vxworks-6.3/target/h/string.h has protos
>>>>> for
>>>>>
>>>>> bzero, but only when #if _EXTENSION_WRS is true. So turn off
>>>>>
>>>>> -ansi/-pedantic gets this? In my dreams?
>>>>>
>>>>> On Mar 17, 2010, at 9:54 PM, 张晶 wrote:
>>>>>
>>>>> Hello all,
>>>>>
>>>>>
>>>>>
>>>>> In order to add some real-time feature to the OpenMPI for some
>>>>> research ,I
>>>>> need a OpenMPI version running on VxWorks. But after going through the
>>>>> Open-MPI website ,I can't found any indication that it supports
>>>>> VxWorks .
>>>>>
>>>>>
>>>>>
>>>>> Follow the thread posted by Ralph Castain ,
>>>>> http://www.open-mpi.org/community/lists/users/2006/06/1371.php .
>>>>> I read some paper about the OpenRTE ,like "Creating a transparent,
>>>>> distributed, and resilient computing environment: the OpenRTE project"
>>>>> and
>>>>> "The Open Run-Time Environment (OpenRTE):A Transparent Multi-cluster
>>>>> Environment for High-Performance Computing"which is written by Ralph
>>>>> H.
>>>>> Castain ・ Jeffrey M. Squyres and others .
>>>>>
>>>>>
>>>>>
>>>>> Now I have a basic understanding of the OpenRTE , however ,there is
>>>>> too few
>>>>> document of the OpenRTE describing the implement of the OpenRTE . I
>>>>> don't
>>>>> know
>>>>> where and how to begin the migration . Any advice will be appreciated.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> Jing Zhang
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> 张晶
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> --
>>> Jeff Squyres
>>> jsquyres_at_[hidden]
>>> For corporate legal information go to:
>>> http://www.cisco.com/web/about/doing_business/legal/cri/
>>>
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
>
> --
> 张晶
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel