Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mpirun hangs
From: Ralph Castain (rhc_at_[hidden])
Date: 2008-05-27 22:34:28


Aha! This is a problem that continues to bite us - it relates to the pty
problem in Mac OSX. Been a ton of chatter about this, but Mac doesn't seem
inclined to fix it.

Try configuring --disable-pty-support and see if that helps. FWIW, you will
find a platform file for Mac OSX in the trunk - I always build with it, and
have spent considerable time fine-tuning it. You configure with:

./configure --prefix=whatever
--with-platform=contrib/platform/lanl/macosx-dynamic

In that directory, you will also find platform files for static builds under
both Tiger and Leopard (slight differences).

ralph

On 5/27/08 8:01 PM, "Greg Watson" <g.watson_at_[hidden]> wrote:

> Ralph,
>
> I tried rolling back to 18513 but no luck. Steps:
>
> $ ./autogen.sh
> $ ./configure --prefix=/usr/local/openmpi-1.3-devel
> $ make
> $ make install
> $ mpicc -g -o xxx xxx.c
> $ mpirun -np 2 ./xxx
> $ ps x
> 44832 s001 R+ 0:50.00 mpirun -np 2 ./xxx
> 44833 s001 S+ 0:00.03 ./xxx
> $ gdb /usr/local/openmpi-1.3-devel/bin/mpirun
> ...
> (gdb) attach 44832
> Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun',
> process 44832.
> Reading symbols for shared libraries ++++
> +.......................................... done
> 0x9371b3dd in ioctl ()
> (gdb) where
> #0 0x9371b3dd in ioctl ()
> #1 0x93754812 in grantpt ()
> #2 0x9375470b in openpty ()
> #3 0x001446d9 in opal_openpty ()
> #4 0x000bf3bf in orte_iof_base_setup_prefork ()
> #5 0x003da62f in odls_default_fork_local_proc (context=0x216a60,
> child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191
> #6 0x000c3e76 in orte_odls_base_default_launch_local ()
> #7 0x003daace in orte_odls_default_launch_local_procs (data=0x216780)
> at odls_default_module.c:360
> #8 0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780,
> tag=1) at orted/orted_comm.c:441
> #9 0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1,
> data=0x216750) at orted/orted_comm.c:346
> #10 0x0012bd21 in event_process_active () at opal_object.h:498
> #11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498
> #12 0x0012bf8c in opal_event_loop () at opal_object.h:498
> #13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169
> #14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:498
> #15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498
> #16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at
> plm_rsh_module.c:1126
> #17 0x00002604 in orterun (argc=4, argv=0xbffff880) at orterun.c:549
> #18 0x00001bd6 in main (argc=4, argv=0xbffff880) at main.c:13
>
> On May 27, 2008, at 9:11 PM, Ralph Castain wrote:
>
>> Yo Greg
>>
>> I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can
>> you tell
>> me how you configured, and the precise command you executed?
>>
>> Thanks
>> Ralph
>>
>>
>>
>> On 5/27/08 5:15 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>
>>> Hmmm...well, it was working about 3 hours ago! I'll try to take a
>>> look
>>> tonight, but it may be tomorrow.
>>>
>>> Try rolling it back just a little to r18513 - that's the last rev I
>>> tested
>>> on my Mac.
>>>
>>>
>>> On 5/27/08 5:00 PM, "Greg Watson" <g.watson_at_[hidden]> wrote:
>>>
>>>> Something seems to be broken in the trunk for MacOS X. I can run a 1
>>>> process job, but a >1 process job hangs. It was working a few days
>>>> ago.
>>>>
>>>> Greg
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel