Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-01-02 10:50:46


Yikes - that's not a good error. :-(

We don't regularly build / test on AIX, so I don't have much
immediate guidance for you. My best suggestion at this point would
be to try the latest 1.2 beta or nightly snapshot. We did an update
of the event engine (the portion of the code that you're seeing the
error issue from) that *may* alleviate the problem...? (I have no
idea, actually -- I'm just kinda hoping that the new version of the
event engine will fix your problem :-\ )

On Dec 27, 2006, at 10:29 AM, Michael Marti wrote:

> Dear All
>
> I am trying to get openmpi-1.1.2 to work on AIX 5.3 / power5.
>
> :: Compilation seems to have worked with the following sequence:
> ====================================================================
> setenv OBJECT_MODE 64
>
> setenv CC xlc
> setenv CXX xlC
> setenv F77 xlf
> setenv FC xlf90
>
> setenv CFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5 -
> q64"
> setenv CXXFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -
> qtune=pwr5 -q64"
> setenv FFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5 -
> q64"
> setenv FCFLAGS "-qthreaded -O3 -qmaxmem=-1 -qarch=pwr5x -qtune=pwr5
> -q64"
> setenv LDFLAGS "-Wl,-brtl"
>
> ./configure --prefix=/ist/openmpi-1.1.2 \
> --disable-mpi-cxx \
> --disable-mpi-cxx-seek \
> --enable-mpi-threads \
> --enable-progress-threads \
> --enable-static \
> --disable-shared \
> --disable-io-romio
> ====================================================================
>
> :: After the compilation I ran make check and all 11 tests passed
> successfully.
>
> :: Now I'm trying to run the following command just for test:
> # mpirun -hostfile /gpfs/MICHAEL/MPI_hostfiles/mpinodes_b41-
> b44_1.asc -np 2 /usr/bin/hostname
> - The file /gpfs/MICHAEL/MPI_hostfiles/mpinodes_b41-b44_1.asc
> contains 4 hosts:
> r1blade041 slots=1
> r1blade042 slots=1
> r1blade043 slots=1
> r1blade044 slots=1
> - The mpirun command eventually hangs with the following message:
> [r1blade041:418014] poll failed with errno=25
> [r1blade041:418014] opal_event_loop: ompi_evesel->dispatch()
> failed.
> - In this state mpirun cannot be killed by hitting <ctrl-c> only a
> kill -9 will do the trick.
> - While the mpirun still hangs I can see that the "orted" has been
> launched on both requested hosts.
>
> :: I turned on all debug options in openmpi-mca-params.conf. The
> output for the same call of mpirun is in the file mpirun-debug.txt.gz.
> <mpirun-debug.txt.gz>
>
> :: As sugested in the mailinglis rules I include config.log
> (config.log.gz) and the output of ompi_info (ompi_info.txt.gz).
> <config.log.gz>
>
> <ompi_info.txt.gz>
>
>
> :: As I am completely new to openmpi (I have some experience with
> lam) I am lost at this stage. I would really appreciate if someone
> could give me some hints as to what is going wrong and where I
> could get more info.
>
> Best regards,
>
> Michael Marti.
>
>
> --
> ----------------------------------------------------------------------
> ------
> Michael Marti
> Centro de Fisica dos Plasmas
> Instituto Superior Tecnico
> Av. Rovisco Pais
> 1049-001 Lisboa
> Portugal
>
> Tel: +351 218 419 379
> Fax: +351 218 464 455
> Mobile: +351 968 434 327
> ----------------------------------------------------------------------
> ------
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems