Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Mike Houston (mhouston_at_[hidden])
Date: 2005-10-31 17:38:32


Sometimes getting crashes:

mpirun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 -hostfile
/u/mhouston/mpihosts mpi_bandwidth 25 131072
mpirun noticed that job rank 0 with PID 10611 on node
"spire-2.stanford.edu" exited on signal 11.
1 process killed (possibly by Open MPI).

The backtrace is bogus, else I'd drop it in.

Setting the number of messages <=10 always seems to work.

-Mike

Tim S. Woodall wrote:

>Mike,
>
>There appears to be an issue in our mvapi get protocol. To temporarily
>disable this:
>
>/u/twoodall> orterun -np 2 -mca mpi_leave_pinned 1 -mca btl_mvapi_flags 2 ./bw 25 131072
>131072 801.580272 (MillionBytes/sec) 764.446518(MegaBytes/sec)
>
>
>Mike Houston wrote:
>
>
>>What's the ETA, or should I try grabbing from cvs?
>>
>>-Mike
>>
>>Tim S. Woodall wrote:
>>
>>
>>
>>
>>>Mike,
>>>
>>>I believe was probably corrected today and should be in the
>>>next release candidate.
>>>
>>>Thanks,
>>>Tim
>>>
>>>Mike Houston wrote:
>>>
>>>
>>>
>>>
>>>
>>>>Woops, spoke to soon. The performance quoted was not actually going
>>>>between nodes. Actually using the network with the pinned option gives:
>>>>
>>>>[0,1,0][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress]
>>>>[0,1,1][btl_mvapi_component.c:631:mca_btl_mvapi_component_progress] Got
>>>>error : VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb74a1c18Got error :
>>>>VAPI_WR_FLUSH_ERR, Vendor code : 0 Frag : 0xb73e1720
>>>>
>>>>repeated many times.
>>>>
>>>>-Mike
>>>>
>>>>Mike Houston wrote:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>That seems to work with the pinning option enabled. THANKS!
>>>>>
>>>>>Now I'll go back to testing my real code. I'm getting 700MB/s for
>>>>>messages >=128KB. This is a little bit lower than MVAPICH, 10-20%, but
>>>>>still pretty darn good. My guess is that I can play with the setting
>>>>>more to tweak up performance. Now if I can get the tcp layer working,
>>>>>I'm pretty much good to go.
>>>>>
>>>>>Any word on an SDP layer? I can probably modify the tcp layer quickly
>>>>>to do SDP, but I thought I would ask.
>>>>>
>>>>>-Mike
>>>>>
>>>>>Tim S. Woodall wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>Hello Mike,
>>>>>>
>>>>>>Mike Houston wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>When only sending a few messages, we get reasonably good IB performance,
>>>>>>>~500MB/s (MVAPICH is 850MB/s). However, if I crank the number of
>>>>>>>messages up, we drop to 3MB/s(!!!). This is with the OSU NBCL
>>>>>>>mpi_bandwidth test. We are running Mellanox IB Gold 1.8 with 3.3.3
>>>>>>>firmware on PCI-X (Couger) boards. Everything works with MVAPICH, but
>>>>>>>we really need the thread support in OpenMPI.
>>>>>>>
>>>>>>>Ideas? I noticed there are a plethora of runtime options configurable
>>>>>>>for mvapi. Do I need to tweak these to get performacne up?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>You might try running w/ the:
>>>>>>
>>>>>>mpirun -mca mpi_leave_pinned 1
>>>>>>
>>>>>>Which will cause mvapi port to maintain an mru cache of registrations,
>>>>>>rather than dynamically pinning/unpinning memory.
>>>>>>
>>>>>>If this does not resolve the BW problems, try increasing the
>>>>>>resources allocated to each connection:
>>>>>>
>>>>>>-mca btl_mvapi_rd_min 128
>>>>>>-mca btl_mvapi_rd_max 256
>>>>>>
>>>>>>Also can you forward me a copy of the test code or a reference to it?
>>>>>>
>>>>>>Thanks,
>>>>>>Tim
>>>>>>_______________________________________________
>>>>>>users mailing list
>>>>>>users_at_[hidden]
>>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>_______________________________________________
>>>>>users mailing list
>>>>>users_at_[hidden]
>>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>_______________________________________________
>>>>users mailing list
>>>>users_at_[hidden]
>>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>>
>>>>
>>>>
>>>_______________________________________________
>>>users mailing list
>>>users_at_[hidden]
>>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>>
>>>
>>_______________________________________________
>>users mailing list
>>users_at_[hidden]
>>http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>_______________________________________________
>users mailing list
>users_at_[hidden]
>http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>