Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn problem
From: Joao Vicente Lima (joao.lima.mail_at_[hidden])
Date: 2008-03-31 11:15:59


Hi again,
when I call MPI_Init_thread in the same program the error is:

spawning ...
opal_mutex_lock(): Resource deadlock avoided
[localhost:07566] *** Process received signal ***
[localhost:07566] Signal: Aborted (6)
[localhost:07566] Signal code: (-6)
[localhost:07566] [ 0] /lib/libpthread.so.0 [0x2abe5630ded0]
[localhost:07566] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2abe5654c3c5]
[localhost:07566] [ 2] /lib/libc.so.6(abort+0x10e) [0x2abe5654d73e]
[localhost:07566] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe5528063b]
[localhost:07566] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280559]
[localhost:07566] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe552805e8]
[localhost:07566] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280fff]
[localhost:07566] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55280f3d]
[localhost:07566] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2abe55281f59]
[localhost:07566] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2abe552823cd]
[localhost:07566] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2a
be58efb5f7]
[localhost:07566] [11] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(MPI_Comm_spawn+0x
465) [0x2abe552b55cd]
[localhost:07566] [12] ./spawn1(main+0x9d) [0x400b05]
[localhost:07566] [13] /lib/libc.so.6(__libc_start_main+0xf4) [0x2abe56539b74]
[localhost:07566] [14] ./spawn1 [0x4009d9]
[localhost:07566] *** End of error message ***
opal_mutex_lock(): Resource deadlock avoided
[localhost:07567] *** Process received signal ***
[localhost:07567] Signal: Aborted (6)
[localhost:07567] Signal code: (-6)
[localhost:07567] [ 0] /lib/libpthread.so.0 [0x2b48610f9ed0]
[localhost:07567] [ 1] /lib/libc.so.6(gsignal+0x35) [0x2b48613383c5]
[localhost:07567] [ 2] /lib/libc.so.6(abort+0x10e) [0x2b486133973e]
[localhost:07567] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c63b]
[localhost:07567] [ 4] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c559]
[localhost:07567] [ 5] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006c5e8]
[localhost:07567] [ 6] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cfff]
[localhost:07567] [ 7] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006cf3d]
[localhost:07567] [ 8] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b486006df59]
[localhost:07567] [ 9] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_proc_unpack+
0x204) [0x2b486006e3cd]
[localhost:07567] [10] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce75f7]
[localhost:07567] [11] /usr/local/mpi/ompi-svn/lib/openmpi/mca_dpm_orte.so [0x2b
4863ce9c2b]
[localhost:07567] [12] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2b48600720d7]
[localhost:07567] [13] /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Init_thread+
0x166) [0x2b48600ae4f2]
[localhost:07567] [14] ./spawn1(main+0x2c) [0x400a94]
[localhost:07567] [15] /lib/libc.so.6(__libc_start_main+0xf4) [0x2b4861325b74]
[localhost:07567] [16] ./spawn1 [0x4009d9]
[localhost:07567] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 7566 on node localhost exited on sig
nal 6 (Aborted).
--------------------------------------------------------------------------

thank for some check,
Joao.

On Mon, Mar 31, 2008 at 11:49 AM, Joao Vicente Lima
<joao.lima.mail_at_[hidden]> wrote:
> Really MPI_Finalize is crashing and calling MPI_Comm_{free,disconnect} works!
> I don't know if the free/disconnect must appear before a MPI_Finalize
> for this case (spawn processes) .... some suggest ?
>
> I use loops in spawn:
> - first for testing :)
> - and second because certain MPI applications don't know in advance
> the number of childrens needed to complete his work.
>
> The spawn works is creat ... I will made other tests.
>
> thanks,
> Joao
>
>
>
> On Mon, Mar 31, 2008 at 3:03 AM, Matt Hughes
> <matt.c.hughes+ompi_at_[hidden]> wrote:
> > On 30/03/2008, Joao Vicente Lima <joao.lima.mail_at_[hidden]> wrote:
> > > Hi,
> > > sorry bring this again ... but i hope use spawn in ompi someday :-D
> >
> > I believe it's crashing in MPI_Finalize because you have not closed
> > all communication paths between the parent and the child processes.
> > For the parent process, try calling MPI_Comm_free or
> > MPI_Comm_disconnect on each intercomm in your intercomm array before
> > calling finalize. On the child, call free or disconnect on the parent
> > intercomm before calling finalize.
> >
> > Out of curiosity, why a loop of spawns? Why not increase the value of
> > the maxprocs argument, or if you need to spawn different executables,
> > or use different arguments for each instance, why not
> > MPI_Comm_spawn_multiple?
> >
> > mch
> >
> >
> >
> >
> >
> > >
> > > The execution of spawn in this way works fine:
> > > MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
> > > MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
> > >
> > > but if this code go to a for I get a problem :
> > > for (i= 0; i < 2; i++)
> > > {
> > > MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
> > > MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm[i], MPI_ERRCODES_IGNORE);
> > > }
> > >
> > > and the error is:
> > > spawning ...
> > > child!
> > > child!
> > > [localhost:03892] *** Process received signal ***
> > > [localhost:03892] Signal: Segmentation fault (11)
> > > [localhost:03892] Signal code: Address not mapped (1)
> > > [localhost:03892] Failing at address: 0xc8
> > > [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
> > > [localhost:03892] [ 1]
> > > /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
> > > [0x2ac71ba7448c]
> > > [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71b9decdf]
> > > [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 [0x2ac71ba04765]
> > > [localhost:03892] [ 4]
> > > /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
> > > [0x2ac71ba365c9]
> > > [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
> > > [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74]
> > > [localhost:03892] [ 7] ./spawn1 [0x400989]
> > > [localhost:03892] *** End of error message ***
> > > --------------------------------------------------------------------------
> > > mpirun noticed that process rank 0 with PID 3892 on node localhost
> > > exited on signal 11 (Segmentation fault).
> > > --------------------------------------------------------------------------
> > >
> > > the attachments contain the ompi_info, config.log and program.
> > >
> > > thanks for some check,
> > >
> > > Joao.
> > >
> >
> >
> > > _______________________________________________
> > > users mailing list
> > > users_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>