Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ken Mighell (mighell_at_[hidden])
Date: 2005-10-21 17:08:44


Dear George,

The patch got malformed when posted. But I did figure out what was
meant.

It turns out that 3 files had to be fixed:

opal/runtime/opal_init.c
orte/runtime/orte_init_stage1.c
orte/runtime/orte_init_stage2.c

in the same way:

[mighell_at_asterix openmpi-1.0rc4]$ diff -u opal/runtime/
opal_init.c_original opal/runtime/opal_init.c
--- opal/runtime/opal_init.c_original Fri Oct 21 13:25:52 2005
+++ opal/runtime/opal_init.c Fri Oct 21 13:48:51 2005
@@ -123,7 +123,7 @@
  error:
      if (ret != OPAL_SUCCESS) {
          opal_show_help("help-opal-runtime",
- "opal_init:startup:internal-failure",
+ "opal_init:startup:internal-failure", true,
                         error, ret);
      }

[mighell_at_asterix openmpi-1.0rc4]$ diff -u orte/runtime/
orte_init_stage1.c_original orte/runtime/orte_init_stage1.c
--- orte/runtime/orte_init_stage1.c_original Fri Oct 21 13:51:41 2005
+++ orte/runtime/orte_init_stage1.c Fri Oct 21 13:52:08 2005
@@ -536,7 +536,7 @@
  error:
      if (ret != ORTE_SUCCESS) {
          opal_show_help("help-orte-runtime",
- "orte_init:startup:internal-failure",
+ "orte_init:startup:internal-failure", true,
                         error, ret);
      }

[mighell_at_asterix openmpi-1.0rc4]$ diff -u orte/runtime/
orte_init_stage2.c_original orte/runtime/orte_init_stage2.c
--- orte/runtime/orte_init_stage2.c_original Fri Oct 21 13:53:15 2005
+++ orte/runtime/orte_init_stage2.c Fri Oct 21 13:53:32 2005
@@ -81,7 +81,7 @@
  error:
      if (ret != ORTE_SUCCESS) {
          opal_show_help("help-orte-runtime",
- "orte_init:startup:internal-failure",
+ "orte_init:startup:internal-failure", true,
                         error, ret);
      }

The system seems to build.

However, the run times for my qlwfpc2 job are now very slow. Jobs end
with comments like

mpirun noticed that job rank 0 with PID 10837 on node "localhost"
exited on signal 25.
3 processes killed (possibly by Open MPI)

-Ken

> Ken,
>
> Please apply the following patch (from your /home/mighell/pkg/ompi/
> openmpi-1.0rc4/ base directory).
>
> Index: opal/runtime/opal_init.c
> ===================================================================
> --- opal/runtime/opal_init.c (revision 7831)
> +++ opal/runtime/opal_init.c (working copy)
> @@ -123,7 +123,7 @@
> error:
> if (ret != OPAL_SUCCESS) {
> opal_show_help("help-opal-runtime",
> - "opal_init:startup:internal-failure",
> + "opal_init:startup:internal-failure", true,
> error, ret);
> }
>
> It should solve this issue. I don't know which compiler you use but
> mine it never catch this up .... as it think that an int is a bool so
> it manage to match the show_help prototype.
>
> Thanks,
> george.
>
> On Oct 21, 2005, at 3:37 PM, Ken Mighell wrote:
>
> > Dear OpenMPI,
> >
> > I tried to build 1.0rc4 on a 3 year old 5-node Beowulf cluster
> > running RedHat Linux 7.3. The build failed during
> > make all; the last few lines of the log file are:
> >
> > mkdir .libs
> > gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../
> > src/event -I../../include -I../.. -I../.. -I../../include -I../../
> > opal -I../../orte -I../../ompi -O3 -DNDEBUG -fno-strict-aliasing -
> > pthread -MT opal_progress.lo -MD -MP -MF .deps/opal_progress.Tpo -c
> > opal_progress.c -fPIC -DPIC -o .libs/opal_progress.o
> > depbase=`echo opal_finalize.lo | sed 's|[^/]*$|.deps/&|;s|\.lo
> $||'`; \
> > if /bin/sh ../../libtool --tag=CC --mode=compile gcc -
> > DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../src/
> > event -I../../include -I../.. -I../.. -I../../include -I../../opal -
> > I../../orte -I../../ompi -O3 -DNDEBUG -fno-strict-aliasing -
> > pthread -MT opal_finalize.lo -MD -MP -MF "$depbase.Tpo" -c -o
> > opal_finalize.lo opal_finalize.c; \
> > then mv -f "$depbase.Tpo" "$depbase.Plo"; else rm -f
> > "$depbase.Tpo"; exit 1; fi
> > gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../
> > src/event -I../../include -I../.. -I../.. -I../../include -I../../
> > opal -I../../orte -I../../ompi -O3 -DNDEBUG -fno-strict-aliasing -
> > pthread -MT opal_finalize.lo -MD -MP -MF .deps/opal_finalize.Tpo -c
> > opal_finalize.c -fPIC -DPIC -o .libs/opal_finalize.o
> > depbase=`echo opal_init.lo | sed 's|[^/]*$|.deps/&|;s|\.lo$||'`; \
> > if /bin/sh ../../libtool --tag=CC --mode=compile gcc -
> > DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../src/
> > event -I../../include -I../.. -I../.. -I../../include -I../../opal -
> > I../../orte -I../../ompi -O3 -DNDEBUG -fno-strict-aliasing -
> > pthread -MT opal_init.lo -MD -MP -MF "$depbase.Tpo" -c -o
> > opal_init.lo opal_init.c; \
> > then mv -f "$depbase.Tpo" "$depbase.Plo"; else rm -f
> > "$depbase.Tpo"; exit 1; fi
> > gcc -DHAVE_CONFIG_H -I. -I. -I../../include -I../../include -I../../
> > src/event -I../../include -I../.. -I../.. -I../../include -I../../
> > opal -I../../orte -I../../ompi -O3 -DNDEBUG -fno-strict-aliasing -
> > pthread -MT opal_init.lo -MD -MP -MF .deps/opal_init.Tpo -c
> > opal_init.c -fPIC -DPIC -o .libs/opal_init.o
> > opal_init.c: In function `opal_init':
> > opal_init.c:127: incompatible type for argument 3 of
> `opal_show_help'
> > make[2]: *** [opal_init.lo] Error 1
> > make[1]: *** [all-recursive] Error 1
> > make: *** [all-recursive] Error 1
> > make[2]: Leaving directory `/home/mighell/pkg/ompi/openmpi-1.0rc4/
> > opal/runtime'
> > make[1]: Leaving directory `/home/mighell/pkg/ompi/openmpi-1.0rc4/
> > opal'
> >
> > I have included gzipped versions of config.log and the result of
> > make all:
> >
> > <config.log.gz>
> > <make_all.log.gz>
> >
> > I was able to build this same package on my Apple dual G5 tower
> > today without any problems.
> >
> > Keep up the good work!
> >
> > Best regards,
> >
> > -Ken Mighell
> >
> >
> ----------------------------------------------------------------------
> > ---------
> > Kenneth Mighell, Associate Scientist E-mail: .............
> > mighell_at_[hidden]
> > Kitt Peak National Observatory Phone: ..................
> > 520-318-8391
> > National Optical Astronomy Observatory Fax: ....................
> > 520-318-8360
> > P.O. Box 26732, Tucson, AZ 85726-6732 URL: http://www.noao.edu/
> > staff/mighell
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> "Half of what I say is meaningless; but I say it so that the other
> half may reach you"
> Kahlil Gibran