Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] OMPI 1.3 problems
From: Greg Watson (g.watson_at_[hidden])
Date: 2008-08-04 19:50:25


Configuring with ./configure --prefix=/usr/local/openmpi-1.3-devel --
with-platform=contrib/platform/lanl/macosx-dynamic --disable-io-romio

Recompiling the app, then running with mpirun -np 5 ./shallow

All processes show R+ as their status. If I attach gdb to a worker I
get the following stack trace:

(gdb) where
#0 0x9045e58a in swtch_pri ()
#1 0x904ccbc1 in sched_yield ()
#2 0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3 0x004bb0bc in opal_condition_wait ()
#4 0x004bca5c in ompi_request_wait_completion ()
#5 0x004bc92a in mca_pml_ob1_send ()
#6 0x003cdcab in MPI_Send ()
#7 0x0000453f in send_updated_ds (res_type=0x5040, jstart=8, jend=11,
ds=0xbfff85b0, indx=57, master_id=0) at worker.c:214
#8 0x0000444d in worker () at worker.c:185
#9 0x00002e0b in main (argc=1, argv=0xbffff0b8) at main.c:90

The master process shows:

(gdb) where
#0 0x9045e58a in swtch_pri ()
#1 0x904ccbc1 in sched_yield ()
#2 0x000f6480 in opal_progress () at runtime/opal_progress.c:220
#3 0x004ba8bb in opal_condition_wait ()
#4 0x004ba6e4 in ompi_request_wait_completion ()
#5 0x004ba589 in mca_pml_ob1_recv ()
#6 0x003c80aa in MPI_Recv ()
#7 0x0000354c in update_global_ds (res_type=0x5040, indx=57,
ds=0xbfffd068) at main.c:257
#8 0x00003334 in main (argc=1, argv=0xbffff0b8) at main.c:195

Seems to be stuck in communication.

Greg

On Aug 4, 2008, at 6:12 PM, Ralph Castain wrote:

> Can you tell us how you are configuring and your command line? As I
> said, I'm having no problem running your code on my Mac w/10.5,
> both PowerPC and Intel.
>
> Ralph
>
> On Aug 4, 2008, at 3:10 PM, Greg Watson wrote:
>
>> Yes the application does sends/receives. No, it doesn't seem to be
>> getting past MPI_Init.
>>
>> I've reinstalled from a completely new 1.3 branch. Still hangs.
>>
>> Greg
>>
>> On Aug 4, 2008, at 4:45 PM, Terry Dontje wrote:
>>
>>> Are you doing any communications? Have you gotten past MPI_Init?
>>> Could
>>> your issue be related to the following ticket?
>>>
>>> https://svn.open-mpi.org/trac/ompi/ticket/1378
>>>
>>>
>>> --td
>>> Greg Watson wrote:
>>>> I'm seeing the same behavior on trunk as 1.3. The program just
>>>> hangs.
>>>>
>>>> Greg
>>>>
>>>> On Aug 4, 2008, at 2:25 PM, Ralph Castain wrote:
>>>>
>>>>> Well, I unfortunately cannot test this right now Greg - the 1.3
>>>>> branch won't build due to a problem with the man page installation
>>>>> script. The fix is in the trunk, but hasn't migrated across yet.
>>>>>
>>>>> :-//
>>>>>
>>>>> My guess is that you are caught on some stage where the hanging
>>>>> bugs
>>>>> hadn't been fixed, but you cannot update to the current head of
>>>>> the
>>>>> 1.3 branch as it won't compile. All I can suggest is shifting to
>>>>> the
>>>>> trunk (which definitely works) for now as the man page fix should
>>>>> migrate soon.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Aug 4, 2008, at 12:12 PM, Ralph Castain wrote:
>>>>>
>>>>>> Depending upon the r-level, there was a problem for awhile with
>>>>>> the
>>>>>> system hanging that was caused by a couple of completely
>>>>>> unrelated
>>>>>> issues. I believe these have been fixed now - at least, it is
>>>>>> fixed
>>>>>> on the trunk for me under that same system. I'll check 1.3 now
>>>>>> - it
>>>>>> could be that some commits are missing over there.
>>>>>>
>>>>>>
>>>>>> On Aug 4, 2008, at 12:06 PM, Greg Watson wrote:
>>>>>>
>>>>>>> I have a fairly simple test program that runs fine under 1.2 on
>>>>>>> MacOS X 10.5 . When I recompile and run it under 1.3 (head of
>>>>>>> 1.3
>>>>>>> branch) it just hangs.
>>>>>>>
>>>>>>> They are both built using
>>>>>>> --with-platform=contrib/platform/lanl/macosx-dynamic. For 1.3,
>>>>>>> I've
>>>>>>> added --disable-io-romio.
>>>>>>>
>>>>>>> Any suggestions?
>>>>>>>
>>>>>>> Greg
>>>>>>> _______________________________________________
>>>>>>> devel mailing list
>>>>>>> devel_at_[hidden]
>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> devel mailing list
>>>>>> devel_at_[hidden]
>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>> _______________________________________________
>>>>> devel mailing list
>>>>> devel_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>>>
>>>>
>>>> _______________________________________________
>>>> devel mailing list
>>>> devel_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>