Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Migrate OpenMPI to the VxWorks
From: Õž§ (iam.chilli_at_[hidden])
Date: 2010-06-04 01:43:10


Hi Castain ,

Your last mail to me is really helpful . I met most of the issues
listed and fixed them as the off-list solution or mine .
Also as the openmpi code changed there are some other issues (almost
the missing function ) that are not reported .For example , the
gettimeofday posix function is not implemented by vxworks library ,I
just wrote a small library for those function. Until now I have
successfully compiled the libopen-rte.a and libopen-pal.a , but now
I stuck
at the problem of fork and exec ,which is not available in the
vxworks. It is not possible to implement the fork and exec by myself.I
have to read through the code using the fork ,then substitute them
with rtpspawn() . It is a challenging work.I really want to know how
Brian Barrett deals with the fork() and exec() .

Thanks

Jing

2010/3/18 Ralph Castain <rhc_at_[hidden]>:
> Hi Jing
> Someone else took a look at this off-list a few years ago. It was mostly a
> problem with the build system (some flags are different) and header file
> names. I don't believe the port was ever completed though.
> I have appended the results of that conversation - the last message
> contained a list of the issues. You would need to update that to the trunk
> of course as the code has changed considerably since that discussion took
> place. Brian Barrett subsequently created a first-cut at fixing some of
> these, but that appears to have been lost in the years since it was done -
> and wouldn't really be current anyway.
> I would be happy to assist as I can.
> Ralph
>
> 1. configure issues with "checking prefix for global symbol labels"
>
> 1a. VxWorks assembler (CCAS=asppc) generates a.out by default (vs.
>
> conftest.o that we need subsequently)
>
> there is this fragment to determine the way to assemble conftest.s:
>
> if test "$CC" = "$CCAS" ; then
>
> ompi_assemble="$CCAS $CCASFLAGS -c conftest.s >conftest.out 2>&1"
>
> else
>
> ompi_assemble="$CCAS $CCASFLAGS conftest.s >conftest.out 2>&1"
>
> fi
>
> The subsequent link fails because conftest.o does not exist:
>
> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o conftest >
> conftest.link 2>&1"
>
> To work around the problem, I did not set CCAS. This gives me the first
>
> invocation that includes the -c argument to CC=ccppc, generating
>
> conftest.o output.
>
>
> 1b. linker fails because LDFLAGS are not passed
>
> The same linker command line caused problems because $CFLAGS were passed
>
> to the linker
>
> ompi_link="$CC $CFLAGS conftest_c.$OBJEXT conftest.$OBJEXT -o conftest >
> conftest.link 2>&1"
>
> In my environment, I set CC/CFLAGS/LDFLAGS as follows:
>
> CC=ccppc
>
> CFLAGS=-ggdb3 -std=c99 -pedantic -mrtp -msoft-float -mstrict-align
>
> -mregnames -fno-builtin -fexceptions'
>
> LDFLAGS=-mrtp -msoft-float -Wl,--start-group -Wl,--end-group
>
> -L/amd/raptor/root/opt/WindRiver/vxworks-6.3/target/usr/lib/ppc/PPC32/sfcommon
>
> The linker flags are not passed because the ompi_link
>
> [xp-kcain1:build_vxworks] ccppc -ggdb3 -std=c99 -pedantic -mrtp
>
> -msoft-float -mstrict-align -mregnames -fno-builtin -fexceptions -o
>
> hello hello.c
>
> /amd/raptor/root/opt/WindRiver/gnu/3.4.4-vxworks-6.3/x86-linux2/bin/../lib/gcc/powerpc-wrs-vxworks/3.4.4/../../../../powerpc-wrs-vxworks/bin/ld:
>
>
> cannot find -lc_internal
>
> collect2: ld returned 1 exit status
>
>
> 2. OPAL atomics asm.c:
>
> int versus int32_t (refer to email with Brian Barrett
>
> 3. OPAL event/event.c: sys/time.h and timercmp() macros not defined by
>
> VxWorks
>
> refer to workaround in event.c using #ifdef MCS_VXWORKS
>
> 4. OPAL event/event.c: pipe() syscall not found
>
> workaround:
>
> #ifdef HAVE_UNISTD_H
>
> #include <unistd.h>
>
> #ifdef MCS_VXWORKS
>
> #include <ioLib.h> /* for pipe() */
>
> #endif
>
> #endif
>
> 5. OPAL event/signal.c
>
> static sig_atomic_t opal_evsigcaught[NSIG];
>
> NSIG is not defined
>
> but _NSIGS is
>
> In Linux, NSIG is defined with -D__USE_MISC
>
> So I added this code fragment to signal.c:
>
> /* VxWorks signal.h defines _NSIGS, not NSIG */
>
> #ifdef MCS_VXWORKS
>
> #define NSIG (_NSIGS+1)
>
> #endif
>
>
> 6. OPAL event/signal.c: no socketpair()
>
> workaround: use pipe():
>
> #ifdef HAVE_UNISTD_H
>
> #include <unistd.h>
>
> #ifdef MCS_VXWORKS
>
> #include <ioLib.h> /* for pipe() */
>
> #endif
>
> #endif
>
> and later in void opal_evsignal_init(sigset_t *evsigmask)
>
> #ifdef MCS_VXWORKS
>
> if (pipe(ev_signal_pair) == -1)
>
> event_err(1, "%s: pipe", __func__);
>
> #else
>
> if (socketpair(AF_UNIX, SOCK_STREAM, 0, ev_signal_pair) == -1)
>
> event_err(1, "%s: socketpair", __func__);
>
> #endif
>
> 7. OPAL util/basename.c: #if HAVE_DIRNAME problem
>
> ../../../opal/util/basename.c:23:5: warning: "HAVE_DIRNAME" is not defined
>
> ../../../opal/util/basename.c: In function `opal_dirname':
>
> problem: HAVE_DIRNAME is not defined in opal_config.h so the #if
>
> HAVE_DIRNAME will fail at preprocessor/compile time
>
> workaround:
>
> change #if HAVE_DIRNAME to #if defined(HAVE_DIRNAME)
>
>
> 8. OPAL util/basename.c: strncopy_s and _strdup
>
> ../../../opal/util/basename.c: In function `opal_dirname':
>
> ../../../opal/util/basename.c:153: error: implicit declaration of
>
> function `strncpy_s'
>
> ../../../opal/util/basename.c:160: error: implicit declaration of
>
> function `_strdup'
>
> #ifdef MCS_VXWORKS
>
> strncpy( ret, filename, p - filename);
>
> #else
>
> strncpy_s( ret, (p - filename + 1), filename, p - filename );
>
> #endif
>
> #ifdef MCS_VXWORKS
>
> return strdup(".");
>
> #else
>
> return _strdup(".");
>
> #endif
>
>
>
> 9. opal/util/if.c: socket() prototype not found in vxworks headers
>
> #ifdef HAVE_SYS_SOCKET_H
>
> #include <sys/socket.h>
>
> #ifdef MCS_VXWORKS
>
> #include <sockLib.h>
>
> #endif
>
> #endif
>
> 10. opal/util/if.c: ioctl()
>
> #ifdef HAVE_SYS_IOCTL_H
>
> #include <sys/ioctl.h>
>
> #ifdef MCS_VXWORKS
>
> #include <ioLib.h>
>
> #endif
>
> #endif
>
> 11. opal/util/os_path.c: MAXPATHLEN change to PATH_MAX
>
> #ifdef MCS_VXWORKS
>
> if (total_length > PATH_MAX) { /* path length is too long - reject
>
> it */
>
> return(NULL);
>
> #else
>
> if (total_length > MAXPATHLEN) { /* path length is too long -
>
> reject it */
>
> return(NULL);
>
> #endif
>
>
> 12. opal/util/output.c: gethostname()
>
> include <hostLib.h>
>
> 13. opal/util/output.c: MAXPATHLEN
>
> same fix as os_path.c above
>
> 14. opal/util/output.c: closelog/openlog/syslog
>
> manually turned off HAVE_SYSLOG_H in opal_config.h
>
> then got a patch from Jeff Squyres that avoids syslog
>
> 15. opal/util/opal_pty.c
>
> complains about mismatched prototype of opal_openpty() between this
>
> source file and opal_pty.h
>
> workaround: manually edit build_vxworks_ppc/opal/include/opal_config.h,
>
> use the following line (change 1 to 0):
>
> #define OMPI_ENABLE_PTY_SUPPORT 0
>
> 16. opal/util/stacktrace.c
>
> FPE_FLTINV not present in signal.h
>
> workaround: edit opal_config.h to turn off
>
> OMPI_WANT_PRETTY_PRINT_STACKTRACE (this can be explicitly configured out
>
> but I don't want to reconfigure because I hacked #15 above)
>
> 17. opal/mca/base/mca_base_open.c
>
> gethostname() -- same as opal/util/output.c, must include hostLib.h
>
> 18. opal_progress.c
>
> from opal/event/event.h (that I modified earlier)
>
> cannot find #include <sys/_timeradd.h>
>
> It is in opal/event/compat/sys
>
> workaround: change event.h to include the definitions that are present
>
> in _timeradd.h instead of including it.
>
> 19. Link errors for opal_wrapper
>
> strcasecmp
>
> strncasecmp
>
> I rolled my own in mca_base_open.c (temporary fix, since we may come across
> this problem elsewhere in the code).
>
> 20. dss_internal.h uses a type 'uint'
>
> Not sure if it's depending on something in the headers, or something it
>
> defined on its own.
>
> I changed it to be just like the header I found somewhere under Linux
> /usr/include:
>
> #ifdef MCS_VXWORKS
>
> typedef unsigned int uint;
>
> #endif
>
> 21. struct iovec definition needed
>
> orte/mca/iof/base/iof_base_fragment.h:45: warning: array type has
>
> incomplete element type
>
> #ifdef MCS_VXWORKS
>
> #include <net/uio.h>
>
> #endif
>
> not sure if this is right, or if I should include something like
>
> <netBufLib.h> or <ioLib.h>
>
>
> 22. iof_base_setup.c
>
> struct termios not understood
>
> can only find termios.h header in 'diab' area and I'm not using that
>
> compiler.
>
> a variable usepty is set to 0 already when OMPI_ENABLE_PTY_SUPPORT is 0.
>
> So, why are we compiling this fragment of code at all? I hacked the file
>
> so that the struct termios code will not get compiled.
>
> 23. oob_base_send/recv.c, oob_base_send/recv_nb.c. struct iovec not known.
>
> #ifdef MCS_VXWORKS
>
> #include <net/uio.h>
>
> #endif
>
> 24. orte/mca/rmgr/base/rmgr_base_check_context.c:58: error:
>
> `MAXHOSTNAMELEN' undeclared (first use in this function)
>
> #ifdef MCS_VXWORKS
>
> #define MAXHOSTNAMELEN 64
>
> #endif
>
> 25. orte/mca/rmgr/base/rmgr_base_check_context.c:58:
>
> gethostname()
>
> #ifdef MCS_VXWORKS
>
> #include <hostLib.h>
>
> #endif
>
> 26. orte/mca/iof/proxy/iof_proxy.h:135: warning: array type has
>
> incomplete element type
>
> ../../../../../orte/mca/iof/proxy/iof_proxy.h:135: error: field
>
> `proxy_iov' has incomplete type
>
> #ifdef MCS_VXWORKS
>
> #include <net/uio.h>
>
> #endif
>
> 27. /orte/mca/iof/svc/iof_svc.h:147: warning: array type has incomplete
>
> element type
>
> ../../../../../orte/mca/iof/svc/iof_svc.h:147: error: field `svc_iov'
>
> has incomplete type
>
> #ifdef MCS_VXWORKS
>
> #include <net/uio.h>
>
> #endif
>
> 28. ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: warning: array
>
> type has incomplete element type
>
> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:66: error: field `msg_iov'
>
> has incomplete type
>
> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h: In function
>
> `mca_oob_tcp_msg_iov_alloc':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp_msg.h:196: error: invalid
>
> application of `sizeof' to incomplete type `iovec'
>
>
> 29. ../../../../../orte/mca/oob/tcp/oob_tcp.c:344: error: implicit
>
> declaration of function `accept'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>
> `mca_oob_tcp_create_listen':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:383: error: implicit
>
> declaration of function `socket'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:399: error: implicit
>
> declaration of function `bind'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:407: error: implicit
>
> declaration of function `getsockname'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:415: error: implicit
>
> declaration of function `listen'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>
> `mca_oob_tcp_listen_thread':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:459: error: implicit
>
> declaration of function `bzero'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>
> `mca_oob_tcp_recv_probe':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:696: error: implicit
>
> declaration of function `send'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function
>
> `mca_oob_tcp_recv_handler':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:795: error: implicit
>
> declaration of function `recv'
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c: In function `mca_oob_tcp_init':
>
> ../../../../../orte/mca/oob/tcp/oob_tcp.c:1087: error: implicit
>
> declaration of function `usleep'
>
> This gets rid of most (except bzero and usleep)
>
> #ifdef MCS_VXWORKS
>
> #include <sockLib.h>
>
> #endif
>
> Trying to reconfigure the package so CFLAGS will not include -pedantic.
>
> This is because $WIND_HOME/vxworks-6.3/target/h/string.h has protos for
>
> bzero, but only when #if _EXTENSION_WRS is true. So turn off
>
> -ansi/-pedantic gets this? In my dreams?
>
> On Mar 17, 2010, at 9:54 PM, Õž§ wrote:
>
> Hello all,
>
>
>
> In order to add some real-time feature to the OpenMPI for some research ,I
> need a OpenMPI version running on VxWorks. But after going through the
> Open-MPI website ,I can't found any indication that it supports VxWorks .
>
>
>
> Follow the thread posted by Ralph Castain ,
> http://www.open-mpi.org/community/lists/users/2006/06/1371.php .
> I read some paper about the OpenRTE ,like "Creating a transparent,
> distributed, and resilient computing environment: the OpenRTE project" and
> "The Open Run-Time Environment (OpenRTE):A Transparent Multi-cluster
> Environment for High-Performance Computing"which is written by Ralph H.
> Castain ¡¤ Jeffrey M. Squyres and others .
>
>
>
> Now I have a basic understanding of the OpenRTE , however ,there is too few
> document of the OpenRTE describing the implement of the OpenRTE . I don't
> know
> where and how to begin the migration . Any advice will be appreciated.
>
>
>
>
>
> Thanks
>
>
>
> Jing Zhang
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>

-- 
Õž§