Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Mike Houston (mhouston_at_[hidden])
Date: 2005-10-31 17:38:32


Sometimes getting crashes:

mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 25 131072
mpirun noticed that job rank 0 with PID 10611 on node
"spire-2.stanford.edu" exited on signal 11.
1 process killed (possibly by Open MPI).

The backtrace is bogus, else I'd drop it in.

Setting the number of messages <=10 always seems to work.

-Mike

Tim S. Woodall wrote:

>Mike,
>
>There appears to be an issue in our mvapi get protocol. To temporarily
>disable this:
>
>/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 25 131072
>131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec)
>
>
>Mike Houston wrote:
>
>
>>What's the ETA, or should I try grabbing from cvs?
>>
>>-Mike
>>
>>Tim S. Woodall wrote:
>>
>>
>>
>>
>>>Mike,
>>>
>>>I believe was probably corrected today and should be in the
>>>next release candidate.
>>>
>>>Thanks,
>>>Tim
>>>
>>>Mike Houston wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Woops, spoke to soon. The performance quoted was not actually going
>>>>between nodes. Actually using the network with the pinned option gives:
>>>>
>>>>[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress]
>>>>[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got
>>>>error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error :
>>>>VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720
>>>>
>>>>repeated many times.
>>>>
>>>>-Mike
>>>>
>>>>Mike Houston wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>That seems to work with the pinning option enabled. THANKS!
>>>>>
>>>>>Now I'll go back to testing my real code. I'm getting 700MB/s for
>>>>>messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but
>>>>>still pretty darn good. My guess is that I can play with the setting
>>>>>more to tweak up performance. Now if I can get the tcp layer working,
>>>>>I'm pretty much good to go.
>>>>>
>>>>>Any word on an SDP layer? I can probably modify the tcp layer quickly
>>>>>to do SDP, but I thought I would ask.
>>>>>
>>>>>-Mike
>>>>>
>>>>>Tim S. Woodall wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hello Mike,
>>>>>>
>>>>>>Mike Houston wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>When only sending a few messages, we get reasonably good IB performance,
>>>>>>>~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
>>>>>>>messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
>>>>>>>mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3
>>>>>>>firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but
>>>>>>>we really need the thread support in OpenMPI.
>>>>>>>
>>>>>>>Ideas? I noticed there are a plethora of runtime options configurable
>>>>>>>for mvapi. Do I need to tweak these to get performacne up?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>You might try running w/ the:
>>>>>>
>>>>>>mpirun -mca mpi_leave_pinned 1
>>>>>>
>>>>>>Which will cause mvapi port to maintain an mru cache of registrations,
>>>>>>rather than dynamically pinning/unpinning memory.
>>>>>>
>>>>>>If this does not resolve the BW problems, try increasing the
>>>>>>resources allocated to each connection:
>>>>>>
>>>>>>-mca btl_mvapi_rd_min 128
>>>>>>-mca btl_mvapi_rd_max 256
>>>>>>
>>>>>>Also can you forward me a copy of the test code or a reference to it?
>>>>>>
>>>>>>Thanks,
>>>>>>Tim
>>>>>>_______________________________________________
>>>>>>users mailing list
>>>>>>users_at_[hidden]
>>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>_______________________________________________
>>>>>users mailing list
>>>>>users_at_[hidden]
>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>_______________________________________________
>>>>users mailing list
>>>>users_at_[hidden]
>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>