Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Migrate OpenMPI to the VxWorks
From: Ralph Castain (rhc_at_[hidden])
Date: 2010-03-18 02:39:14


Hi Jing

Someone else took a look at this off-list a few years ago. It was mostly a problem with the build system (some flags are different) and header file names. I don't believe the port was ever completed though.

I have appended the results of that conversation - the last message contained a list of the issues. You would need to update that to the trunk of course as the code has changed considerably since that discussion took place. Brian Barrett subsequently created a first-cut at fixing some of these, but that appears to have been lost in the years since it was done - and wouldn't really be current anyway.

I would be happy to assist as I can.
Ralph

>> 1. configure issues with "checking prefix for global symbol labels"
>>
>> 1a. VxWorks assembler (CCAS=asppc) generates a.out by default (vs.
>> conftest.o that we need subsequently)
>>
>> there is this fragment to determine the way to assemble conftest.s:
>>
>>> if test "$CC" = "$CCAS" ; then
>>> ompi_assemble="$CCAS $CCASFLAGS -c conftest.s >conftest.out 2>&1"
>>> else
>>> ompi_assemble="$CCAS $CCASFLAGS conftest.s >conftest.out 2>&1"
>>> fi
>>
>> The subsequent link fails because conftest.o does not exist:
>>
>>> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o conftest > conftest.link 2>&1"
>>
>> To work around the problem, I did not set CCAS. This gives me the first
>> invocation that includes the -c argument to CC=ccppc, generating
>> conftest.o output.
>>
>>
>> 1b. linker fails because LDFLAGS are not passed
>>
>> The same linker command line caused problems because $CFLAGS were passed
>> to the linker
>>
>>> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o conftest > conftest.link 2>&1"
>>
>> In my environment, I set CC/CFLAGS/LDFLAGS as follows:
>> CC=ccppc
>>
>> CFLAGS=-ggdb3 -std=c99 -pedantic -mrtp -msoft-float -mstrict-align
>> -mregnames -fno-builtin -fexceptions'
>>
>> LDFLAGS=-mrtp -msoft-float -Wl,--start-group -Wl,--end-group
>> -L/amd/raptor/root/opt/WindRiver/vxworks-6.3/target/usr/lib/ppc/PPC32/sfcommon
>>
>> The linker flags are not passed because the ompi_link
>>
>> [xp-kcain1:build_vxworks] ccppc -ggdb3 -std=c99 -pedantic -mrtp
>> -msoft-float -mstrict-align -mregnames -fno-builtin -fexceptions -o
>> hello hello.c
>> /amd/raptor/root/opt/WindRiver/gnu/3.4.4-vxworks-6.3/x86-linux2/bin/../lib/gcc/powerpc-wrs-vxworks/3.4.4/../../../../powerpc-wrs-vxworks/bin/ld:
>>
>>
>> cannot find -lc_internal
>> collect2: ld returned 1 exit status
>>
>>
>> 2. OPAL atomics asm.c:
>> int versus int32_t (refer to email with Brian Barrett
>>
>> 3. OPAL event/event.c: sys/time.h and timercmp() macros not defined by
>> VxWorks
>> refer to workaround in event.c using #ifdef MCS_VXWORKS
>>
>> 4. OPAL event/event.c: pipe() syscall not found
>> workaround:
>>
>>> #ifdef HAVE_UNISTD_H
>>> #include <unistd.h>
>>> #ifdef MCS_VXWORKS
>>> #include <ioLib.h> /* for pipe() */
>>> #endif
>>> #endif
>>
>> 5. OPAL event/signal.c
>> static sig_atomic_t opal_evsigcaught[NSIG];
>> NSIG is not defined
>> but _NSIGS is
>>
>> In Linux, NSIG is defined with -D__USE_MISC
>>
>> So I added this code fragment to signal.c:
>>
>>> /* VxWorks signal.h defines _NSIGS, not NSIG */
>>> #ifdef MCS_VXWORKS
>>> #define NSIG (_NSIGS+1)
>>> #endif
>>
>>
>> 6. OPAL event/signal.c: no socketpair()
>>
>> workaround: use pipe():
>>
>>> #ifdef HAVE_UNISTD_H
>>> #include <unistd.h>
>>> #ifdef MCS_VXWORKS
>>> #include <ioLib.h> /* for pipe() */
>>> #endif
>>> #endif
>>
>> and later in void opal_evsignal_init(sigset_t *evsigmask)
>>
>>> #ifdef MCS_VXWORKS
>>> if (pipe(ev_signal_pair) == -1)
>>> event_err(1, "%s: pipe", __func__);
>>> #else
>>> if (socketpair(AF_UNIX, SOCK_STREAM, 0, ev_signal_pair) == -1)
>>> event_err(1, "%s: socketpair", __func__);
>>> #endif
>>
>> 7. OPAL util/basename.c: #if HAVE_DIRNAME problem
>>
>> ../../../opal/util/basename.c:23:5: warning: "HAVE_DIRNAME" is not defined
>> ../../../opal/util/basename.c: In function `opal_dirname':
>>
>> problem: HAVE_DIRNAME is not defined in opal_config.h so the #if
>> HAVE_DIRNAME will fail at preprocessor/compile time
>>
>> workaround:
>> change #if HAVE_DIRNAME to #if defined(HAVE_DIRNAME)
>>
>>
>> 8. OPAL util/basename.c: strncopy_s and _strdup
>> ../../../opal/util/basename.c: In function `opal_dirname':
>> ../../../opal/util/basename.c:153: error: implicit declaration of
>> function `strncpy_s'
>> ../../../opal/util/basename.c:160: error: implicit declaration of
>> function `_strdup'
>>
>>> #ifdef MCS_VXWORKS
>>> strncpy( ret, filename, p - filename);
>>> #else
>>> strncpy_s( ret, (p - filename + 1), filename, p - filename );
>>> #endif
>>> #ifdef MCS_VXWORKS
>>> return strdup(".");
>>> #else
>>> return _strdup(".");
>>> #endif
>>
>>
>>
>> 9. opal/util/if.c: socket() prototype not found in vxworks headers
>>
>>> #ifdef HAVE_SYS_SOCKET_H
>>> #include <sys/socket.h>
>>> #ifdef MCS_VXWORKS
>>> #include <sockLib.h>
>>> #endif
>>> #endif
>>
>> 10. opal/util/if.c: ioctl()
>>
>>> #ifdef HAVE_SYS_IOCTL_H
>>> #include <sys/ioctl.h>
>>> #ifdef MCS_VXWORKS
>>> #include <ioLib.h>
>>> #endif
>>> #endif
>>
>> 11. opal/util/os_path.c: MAXPATHLEN change to PATH_MAX
>>
>> #ifdef MCS_VXWORKS
>> if (total_length > PATH_MAX) { /* path length is too long - reject
>> it */
>> return(NULL);
>> #else
>> if (total_length > MAXPATHLEN) { /* path length is too long -
>> reject it */
>> return(NULL);
>> #endif
>>
>>
>> 12. opal/util/output.c: gethostname()
>> include <hostLib.h>
>>
>> 13. opal/util/output.c: MAXPATHLEN
>> same fix as os_path.c above
>>
>> 14. opal/util/output.c: closelog/openlog/syslog
>> manually turned off HAVE_SYSLOG_H in opal_config.h
>> then got a patch from Jeff Squyres that avoids syslog
>>
>> 15. opal/util/opal_pty.c
>> complains about mismatched prototype of opal_openpty() between this
>> source file and opal_pty.h
>>
>> workaround: manually edit build_vxworks_ppc/opal/include/opal_config.h,
>> use the following line (change 1 to 0):
>> #define OMPI_ENABLE_PTY_SUPPORT 0
>>
>> 16. opal/util/stacktrace.c
>> FPE_FLTINV not present in signal.h
>>
>> workaround: edit opal_config.h to turn off
>> OMPI_WANT_PRETTY_PRINT_STACKTRACE (this can be explicitly configured out
>> but I don't want to reconfigure because I hacked #15 above)
>>
>> 17. opal/mca/base/mca_base_open.c
>> gethostname() -- same as opal/util/output.c, must include hostLib.h
>>
>> 18. opal_progress.c
>> from opal/event/event.h (that I modified earlier)
>> cannot find #include <sys/_timeradd.h>
>> It is in opal/event/compat/sys
>>
>> workaround: change event.h to include the definitions that are present
>> in _timeradd.h instead of including it.
>>
>> 19. Link errors for opal_wrapper
>> strcasecmp
>> strncasecmp
>>
>> I rolled my own in mca_base_open.c (temporary fix, since we may come across this problem elsewhere in the code).
>>
>> 20. dss_internal.h uses a type 'uint'
>> Not sure if it's depending on something in the headers, or something it
>> defined on its own.
>>
>> I changed it to be just like the header I found somewhere under Linux /usr/include:
>> #ifdef MCS_VXWORKS
>> typedef unsigned int uint;
>> #endif
>>
>> 21. struct iovec definition needed
>> orte/mca/iof/base/iof_base_fragment.h:45: warning: array type has
>> incomplete element type
>>
>> #ifdef MCS_VXWORKS
>> #include <net/uio.h>
>> #endif
>>
>> not sure if this is right, or if I should include something like
>> <netBufLib.h> or <ioLib.h>
>>
>>
>> 22. iof_base_setup.c
>> struct termios not understood
>> can only find termios.h header in 'diab' area and I'm not using that
>> compiler.
>>
>> a variable usepty is set to 0 already when OMPI_ENABLE_PTY_SUPPORT is 0.
>> So, why are we compiling this fragment of code at all? I hacked the file
>> so that the struct termios code will not get compiled.
>>
>> 23. oob_base_send/recv.c, oob_base_send/recv_nb.c. struct iovec not known.
>>
>> #ifdef MCS_VXWORKS
>> #include <net/uio.h>
>> #endif
>>
>> 24. orte/mca/rmgr/base/rmgr_base_check_context.c:58: error:
>> `MAXHOSTNAMELEN' undeclared (first use in this function)
>>
>> #ifdef MCS_VXWORKS
>> #define MAXHOSTNAMELEN 64
>> #endif
>>
>> 25. orte/mca/rmgr/base/rmgr_base_check_context.c:58:
>> gethostname()
>>
>> #ifdef MCS_VXWORKS
>> #include <hostLib.h>
>> #endif
>>
>> 26. orte/mca/iof/proxy/iof_proxy.h:135: warning: array type has
>> incomplete element type
>> ../../../../../orte/mca/iof/proxy/iof_proxy.h:135: error: field
>> `proxy_iov' has incomplete type
>>
>> #ifdef MCS_VXWORKS
>> #include <net/uio.h>
>> #endif
>>
>> 27. /orte/mca/iof/svc/iof_svc.h:147: warning: array type has incomplete
>> element type
>> ../../../../../orte/mca/iof/svc/iof_svc.h:147: error: field `svc_iov'
>> has incomplete type
>>
>> #ifdef MCS_VXWORKS
>> #include <net/uio.h>
>> #endif
>>
>> 28. ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: warning: array
>> type has incomplete element type
>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: error: field `msg_iov'
>> has incomplete type
>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h: In function
>> `mca_oob_tcp_msg_iov_alloc':
>> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:196: error: invalid
>> application of `sizeof' to incomplete type `iovec'
>>
>>
>> 29. ../../../../../orte/mca/oob/tcp/oob_tcp.c:344: error: implicit
>> declaration of function `accept'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>> `mca_oob_tcp_create_listen':
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:383: error: implicit
>> declaration of function `socket'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:399: error: implicit
>> declaration of function `bind'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:407: error: implicit
>> declaration of function `getsockname'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:415: error: implicit
>> declaration of function `listen'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>> `mca_oob_tcp_listen_thread':
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:459: error: implicit
>> declaration of function `bzero'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>> `mca_oob_tcp_recv_probe':
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:696: error: implicit
>> declaration of function `send'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>> `mca_oob_tcp_recv_handler':
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:795: error: implicit
>> declaration of function `recv'
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function `mca_oob_tcp_init':
>> ../../../../../orte/mca/oob/tcp/oob_tcp.c:1087: error: implicit
>> declaration of function `usleep'
>>
>> This gets rid of most (except bzero and usleep)
>>> #ifdef MCS_VXWORKS
>>> #include <sockLib.h>
>>> #endif
>>
>> Trying to reconfigure the package so CFLAGS will not include -pedantic.
>> This is because $WIND_HOME/vxworks-6.3/target/h/string.h has protos for
>> bzero, but only when #if _EXTENSION_WRS is true. So turn off
>> -ansi/-pedantic gets this? In my dreams?
On Mar 17, 2010, at 9:54 PM, 张晶 wrote:

> Hello all,
>
> In order to add some real-time feature to the OpenMPI for some research ,I need a OpenMPI version running on VxWorks. But after going through the Open-MPI website ,I can’t found any indication that it supports VxWorks .
>
> Follow the thread posted by Ralph Castain , http://www.open-mpi.org/community/lists/users/2006/06/1371.php .
> I read some paper about the OpenRTE ,like “Creating a transparent, distributed, and resilient computing environment: the OpenRTE project” and “The Open Run-Time Environment (OpenRTE):A Transparent Multi-cluster Environment for High-Performance Computing”which is written by Ralph H. Castain · Jeffrey M. Squyres and others .
>
> Now I have a basic understanding of the OpenRTE , however ,there is too few document of the OpenRTE describing the implement of the OpenRTE . I don’t know
> where and how to begin the migration . Any advice will be appreciated.
>
>
> Thanks
>
> Jing Zhang
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel