Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] mpirun hangs
From: Ralph H Castain (rhc_at_[hidden])
Date: 2008-05-28 09:14:06


It could be - I believe the Mac issue has been around for awhile. If you
like, you could use that same platform file and give it a try. I think there
are a few frameworks mentioned in there that aren't in 1.2, but that should
be easy to edit out.

On 5/28/08 7:11 AM, "Greg Watson" <g.watson_at_[hidden]> wrote:

> That fixed it, thanks. I wonder if this is the same problem I'm seeing
> for 1.2.x?
>
> Greg
>
> On May 27, 2008, at 10:34 PM, Ralph Castain wrote:
>
>> Aha! This is a problem that continues to bite us - it relates to the
>> pty
>> problem in Mac OSX. Been a ton of chatter about this, but Mac
>> doesn't seem
>> inclined to fix it.
>>
>> Try configuring --disable-pty-support and see if that helps. FWIW,
>> you will
>> find a platform file for Mac OSX in the trunk - I always build with
>> it, and
>> have spent considerable time fine-tuning it. You configure with:
>>
>> ./configure --prefix=whatever
>> --with-platform=contrib/platform/lanl/macosx-dynamic
>>
>> In that directory, you will also find platform files for static
>> builds under
>> both Tiger and Leopard (slight differences).
>>
>> ralph
>>
>>
>> On 5/27/08 8:01 PM, "Greg Watson" <g.watson_at_[hidden]> wrote:
>>
>>> Ralph,
>>>
>>> I tried rolling back to 18513 but no luck. Steps:
>>>
>>> $ ./autogen.sh
>>> $ ./configure --prefix=/usr/local/openmpi-1.3-devel
>>> $ make
>>> $ make install
>>> $ mpicc -g -o xxx xxx.c
>>> $ mpirun -np 2 ./xxx
>>> $ ps x
>>> 44832 s001 R+ 0:50.00 mpirun -np 2 ./xxx
>>> 44833 s001 S+ 0:00.03 ./xxx
>>> $ gdb /usr/local/openmpi-1.3-devel/bin/mpirun
>>> ...
>>> (gdb) attach 44832
>>> Attaching to program: `/usr/local/openmpi-1.3-devel/bin/mpirun',
>>> process 44832.
>>> Reading symbols for shared libraries ++++
>>> +.......................................... done
>>> 0x9371b3dd in ioctl ()
>>> (gdb) where
>>> #0 0x9371b3dd in ioctl ()
>>> #1 0x93754812 in grantpt ()
>>> #2 0x9375470b in openpty ()
>>> #3 0x001446d9 in opal_openpty ()
>>> #4 0x000bf3bf in orte_iof_base_setup_prefork ()
>>> #5 0x003da62f in odls_default_fork_local_proc (context=0x216a60,
>>> child=0x216dd0, environ_copy=0x217930) at odls_default_module.c:191
>>> #6 0x000c3e76 in orte_odls_base_default_launch_local ()
>>> #7 0x003daace in orte_odls_default_launch_local_procs
>>> (data=0x216780)
>>> at odls_default_module.c:360
>>> #8 0x000ad2f6 in process_commands (sender=0x216768, buffer=0x216780,
>>> tag=1) at orted/orted_comm.c:441
>>> #9 0x000acd52 in orte_daemon_cmd_processor (fd=-1, opal_event=1,
>>> data=0x216750) at orted/orted_comm.c:346
>>> #10 0x0012bd21 in event_process_active () at opal_object.h:498
>>> #11 0x0012c3c5 in opal_event_base_loop () at opal_object.h:498
>>> #12 0x0012bf8c in opal_event_loop () at opal_object.h:498
>>> #13 0x0011b334 in opal_progress () at runtime/opal_progress.c:169
>>> #14 0x000cd9b4 in orte_plm_base_report_launched () at opal_object.h:
>>> 498
>>> #15 0x000cc2b7 in orte_plm_base_launch_apps () at opal_object.h:498
>>> #16 0x0003d626 in orte_plm_rsh_launch (jdata=0x200ae0) at
>>> plm_rsh_module.c:1126
>>> #17 0x00002604 in orterun (argc=4, argv=0xbffff880) at orterun.c:549
>>> #18 0x00001bd6 in main (argc=4, argv=0xbffff880) at main.c:13
>>>
>>> On May 27, 2008, at 9:11 PM, Ralph Castain wrote:
>>>
>>>> Yo Greg
>>>>
>>>> I'm not seeing any problem on my Mac OSX - I'm running Leopard. Can
>>>> you tell
>>>> me how you configured, and the precise command you executed?
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>>
>>>>
>>>> On 5/27/08 5:15 PM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>>>>
>>>>> Hmmm...well, it was working about 3 hours ago! I'll try to take a
>>>>> look
>>>>> tonight, but it may be tomorrow.
>>>>>
>>>>> Try rolling it back just a little to r18513 - that's the last rev I
>>>>> tested
>>>>> on my Mac.
>>>>>
>>>>>
>>>>> On 5/27/08 5:00 PM, "Greg Watson" <g.watson_at_[hidden]> wrote:
>>>>>
>>>>>> Something seems to be broken in the trunk for MacOS X. I can run
>>>>>> a 1
>>>>>> process job, but a >1 process job hangs. It was working a few days
>>>>>> ago.
>>>>>>
>>>>>> Greg
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel