Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Simple MPI_Comm_spawn program hangs
From: Prakash Velayutham (prakash.velayutham_at_[hidden])
Date: 2007-12-06 00:18:22


To add more info, here is a backtrace of the spawned (hung) program.

(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x402cdaec in sched_yield () from /lib/tls/libc.so.6
#2 0x4016360c in opal_progress () at runtime/opal_progress.c:301
#3 0x403a9b29 in mca_oob_tcp_msg_wait (msg=0x805cc70, rc=0xbfffba40)
at oob_tcp_msg.c:108
#4 0x403b09a5 in mca_oob_tcp_recv (peer=0xbfffbba8, iov=0xbfffba88,
count=1, tag=0, flags=4) at oob_tcp_recv.c:138
#5 0x40119420 in mca_oob_recv_packed (peer=0xbfffbba8, buf=0x821b200,
tag=0) at base/oob_base_recv.c:69
#6 0x4003c28b in ompi_comm_allreduce_intra_oob (inbuf=0xbfffbb48,
outbuf=0xbfffbb44, count=1, op=0x400d14a0,
     comm=0x8049d38, bridgecomm=0x0, lleader=0xbfffbc04,
rleader=0xbfffbba8, send_first=1) at communicator/comm_cid.c:674
#7 0x4003adf2 in ompi_comm_nextcid (newcomm=0x807c4f8,
comm=0x8049d38, bridgecomm=0x0, local_leader=0xbfffbc04,
     remote_leader=0xbfffbba8, mode=256, send_first=1) at communicator/
comm_cid.c:176
#8 0x4003cc2c in ompi_comm_connect_accept (comm=0x8049d38, root=0,
port=0x807a5c0, send_first=1, newcomm=0xbfffbc28,
     tag=2000) at communicator/comm_dyn.c:208
#9 0x4003ec97 in ompi_comm_dyn_init () at communicator/comm_dyn.c:668
#10 0x4005465a in ompi_mpi_init (argc=1, argv=0xbfffbf64, requested=0,
provided=0xbfffbd14)
     at runtime/ompi_mpi_init.c:704
#11 0x40090367 in PMPI_Init (argc=0xbfffbee0, argv=0xbfffbee4) at
pinit.c:71
#12 0x08048983 in main (argc=1, argv=0xbfffbf64) at slave.c:43
(gdb)

Prakash

On Dec 6, 2007, at 12:08 AM, Prakash Velayutham wrote:

> Hi Edgar,
>
> I changed the spawned program from /bin/hostname to a very simple MPI
> program as below. But now, the slave hangs right at MPI_Init line.
> What could the issue be?
>
> slave.c
>
> #include <stdio.h>
> #include <string.h>
> #include <stdlib.h>
> #include "mpi.h"
> #include <sys/types.h> /* standard system types */
> #include <netinet/in.h> /* Internet address structures */
> #include <sys/socket.h> /* socket interface functions */
> #include <netdb.h> /* host to IP resolution */
>
> int gdb_var;
> void
> main(int argc, char **argv)
> {
> int tag = 0;
> int my_rank;
> int num_proc;
> MPI_Status status;
> MPI_Comm inter_comm;
>
> gdb_var = 0;
> char hostname[64];
>
> FILE *f;
>
> while (0 == gdb_var) sleep(5);
> gethostname(hostname, 64);
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
> MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
>
> MPI_Comm_get_parent(&inter_comm);
>
> MPI_Finalize();
> exit(0);
> }
>
> Thanks,
> Prakash
>
>
> On Dec 2, 2007, at 8:36 PM, Edgar Gabriel wrote:
>
>> MPI_Comm_spawn is tested nightly by the test our suites, so it should
>> definitely work...
>>
>> Thanks
>> Edgar
>>
>> Prakash Velayutham wrote:
>>> Thanks Edgar. I did not know that. Really?
>>>
>>> Anyways, you are sure, an MPI job will work as a spawned process
>>> instead of "hostname"?
>>>
>>> Thanks,
>>> Prakash
>>>
>>>
>>> On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:
>>>
>>>> MPI_Comm_spawn has to build an intercommunicator with the child
>>>> process
>>>> that it spawns. Thus, you can not spawn a non-MPI job such as
>>>> /bin/hostname, since the parent process waits for some messages
>>>> from
>>>> the
>>>> child process(es) in order to set up the intercommunicator.
>>>>
>>>> Thanks
>>>> Edgar
>>>>
>>>> Prakash Velayutham wrote:
>>>>> Hello,
>>>>>
>>>>> Open MPI 1.2.4
>>>>>
>>>>> I am trying to run a simple C program.
>>>>>
>>>>> ######################################################################################
>>>>>
>>>>> #include <string.h>
>>>>> #include <stdlib.h>
>>>>> #include <stdio.h>
>>>>> #include "mpi.h"
>>>>>
>>>>> void
>>>>> main(int argc, char **argv)
>>>>> {
>>>>>
>>>>> int tag = 0;
>>>>> int my_rank;
>>>>> int num_proc;
>>>>> char message_0[] = "hello slave, i'm your
>>>>> master";
>>>>> char message_1[50];
>>>>> char master_data[] = "slaves to work";
>>>>> int array_of_errcodes[10];
>>>>> int num;
>>>>> MPI_Status status;
>>>>> MPI_Comm inter_comm;
>>>>> MPI_Info info;
>>>>> int arr[1];
>>>>> int rc1;
>>>>>
>>>>> MPI_Init(&argc, &argv);
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
>>>>> MPI_Comm_size(MPI_COMM_WORLD, &num_proc);
>>>>>
>>>>> printf("MASTER : spawning a slave ... \n");
>>>>> rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
>>>>> MPI_INFO_NULL, 0, MPI_COMM_WORLD, &inter_comm, arr);
>>>>>
>>>>> MPI_Finalize();
>>>>> exit(0);
>>>>> }
>>>>>
>>>>> ######################################################################################
>>>>>
>>>>>
>>>>> This program hangs as below:
>>>>>
>>>>> prakash_at_bmi-xeon1-01:~/thesis/CS/Samples> ./master1
>>>>> MASTER : spawning a slave ...
>>>>> bmi-xeon1-01
>>>>>
>>>>> Any ideas why?
>>>>>
>>>>> Thanks,
>>>>> Prakash
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> --
>>>> Edgar Gabriel
>>>> Assistant Professor
>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>> Department of Computer Science University of Houston
>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> --
>> Edgar Gabriel
>> Assistant Professor
>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>> Department of Computer Science University of Houston
>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users