Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] OMPI 1.3 problems
From: Greg Watson (g.watson_at_[hidden])
Date: 2008-08-04 19:50:25


Configuring with ./configure --prefix=/usr/local/openmpi-1.3-devel --
with-platform=contrib/platform/lanl/macosx-dynamic --disable-io-romio

Recompiling the app, then running with mpirun -np 5 ./shallow

All processes show R+ as their status. If I attach gdb to a worker I
get the following stack trace:

(gdb) where
#0 0x9045e58a in swtch_pri ()
#1 0x904ccbc1 in sched_yield ()
#2 0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3 0x004bb0bc in opal_condition_wait ()
#4 0x004bca5c in ompi_request_wait_completion ()
#5 0x004bc92a in mca_pml_ob1_send ()
#6 0x003cdcab in MPI_Send ()
#7 0x0000453f in send_updated_ds (res_type=0x5040, jstart=8, jend=11,
ds=0xbfff85b0, indx=57, master_id=0) at worker.c:214
#8 0x0000444d in worker () at worker.c:185
#9 0x00002e0b in main (argc=1, argv=0xbffff0b8) at main.c:90

The master process shows:

(gdb) where
#0 0x9045e58a in swtch_pri ()
#1 0x904ccbc1 in sched_yield ()
#2 0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3 0x004ba8bb in opal_condition_wait ()
#4 0x004ba6e4 in ompi_request_wait_completion ()
#5 0x004ba589 in mca_pml_ob1_recv ()
#6 0x003c80aa in MPI_Recv ()
#7 0x0000354c in update_global_ds (res_type=0x5040, indx=57,
ds=0xbfffd068) at main.c:257
#8 0x00003334 in main (argc=1, argv=0xbffff0b8) at main.c:195

Seems to be stuck in communication.

Greg

On Aug 4, 2008, at 6:12 PM, Ralph Castain wrote:

> Can you tell us how you are configuring and your command line? As I
> said, I'm having no problem running your code on my Mac w/10.5,
> both PowerPC and Intel.
>
> Ralph
>
> On Aug 4, 2008, at 3:10 PM, Greg Watson wrote:
>
>> Yes the application does sends/receives. No, it doesn't seem to be
>> getting past MPI_Init.
>>
>> I've reinstalled from a completely new 1.3 branch. Still hangs.
>>
>> Greg
>>
>> On Aug 4, 2008, at 4:45 PM, Terry Dontje wrote:
>>
>>> Are you doing any communications? Have you gotten past MPI_Init?
>>> Could
>>> your issue be related to the following ticket?
>>>
>>> https://svn.open-mpi.org/trac/ompi/ticket/1378
>>>
>>>
>>> --td
>>> Greg Watson wrote:
>>>> I'm seeing the same behavior on trunk as 1.3. The program just
>>>> hangs.
>>>>
>>>> Greg
>>>>
>>>> On Aug 4, 2008, at 2:25 PM, Ralph Castain wrote:
>>>>
>>>>> Well, I unfortunately cannot test this right now Greg - the 1.3
>>>>> branch won't build due to a problem with the man page installation
>>>>> script. The fix is in the trunk, but hasn't migrated across yet.
>>>>>
>>>>> :-//
>>>>>
>>>>> My guess is that you are caught on some stage where the hanging
>>>>> bugs
>>>>> hadn't been fixed, but you cannot update to the current head of
>>>>> the
>>>>> 1.3 branch as it won't compile. All I can suggest is shifting to
>>>>> the
>>>>> trunk (which definitely works) for now as the man page fix should
>>>>> migrate soon.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Aug 4, 2008, at 12:12 PM, Ralph Castain wrote:
>>>>>
>>>>>> Depending upon the r-level, there was a problem for awhile with
>>>>>> the
>>>>>> system hanging that was caused by a couple of completely
>>>>>> unrelated
>>>>>> issues. I believe these have been fixed now - at least, it is
>>>>>> fixed
>>>>>> on the trunk for me under that same system. I'll check 1.3 now
>>>>>> - it
>>>>>> could be that some commits are missing over there.
>>>>>>
>>>>>>
>>>>>> On Aug 4, 2008, at 12:06 PM, Greg Watson wrote:
>>>>>>
>>>>>>> I have a fairly simple test program that runs fine under 1.2 on
>>>>>>> MacOS X 10.5 . When I recompile and run it under 1.3 (head of
>>>>>>> 1.3
>>>>>>> branch) it just hangs.
>>>>>>>
>>>>>>> They are both built using
>>>>>>> --with-platform=contrib/platform/lanl/macosx-dynamic. For 1.3,
>>>>>>> I've
>>>>>>> added --disable-io-romio.
>>>>>>>
>>>>>>> Any suggestions?
>>>>>>>
>>>>>>> Greg
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>