Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature
From: Hodge, Gary C (gary.c.hodge_at_[hidden])
Date: 2012-11-02 09:57:33

There is 8GB memory on each node, with 6GB available, swap is off by commenting it out in the /etc/fstab

I cannot try the alternate mechanisms right now, thanks for the info, will try it when we move up to 1.6.1

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Friday, November 02, 2012 9:32 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

What's the memory usage on your nodes -- are you invoking swap, perchance?

Can you try one of the other shared memory mechanisms (sysv or posix)? (I just described how in my previous email)

On Nov 1, 2012, at 11:24 AM, Hodge, Gary C wrote:

> George,
> We move 40K and 160K size messages from process to process on the same node. Our app does mlock(MCL_CURRENT | MCL_FUTURE) before MPI_INIT.
> I measure the page faults using getrusage and record when they increase. I observe increasing ru_minflt values and no ru_majflt increase.
> Increased values reported are 40, 80, or 120; our page size is 4K. The page reclaims/faults are checked after MPI receive processing,
> after our application processing, and after MPI send processing. Our application processing is not the source of increasing reclaims/faults.
> I observe the disk I/O light flashing on nodes when we report increasing reclaims/faults.
> When I turn off the SM BTL, the reclaims stop increasing and the disk I/O light does not blink.
> -----Original Message-----
> From: George Bosilca [mailto:bosilca_at_[hidden]]
> Sent: Thursday, November 01, 2012 12:25 AM
> To: Open MPI Users
> Cc: Hodge, Gary C
> Subject: Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature
> On Oct 30, 2012, at 09:57 , Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Oct 30, 2012, at 9:51 AM, Hodge, Gary C wrote:
>>> FYI, recently, I was tracking down the source of page faults in our application that has real-time requirements. I found that disabling the sm component (--mca btl ^sm) eliminated many page faults I was seeing.
>> Good point. This is likely true; the shared memory component will definitely cause more page faults. Using huge pages may alleviate this (e.g., less TLB usage), but we haven't studied it much.
> This will depend on the communication pattern of the application and the size of the messages. A rise in the number of page faults is not a normal behavior and it is mostly unexpected in most of the common execution scenarios. We reuse the memory pages in the SM BTL, minimizing the page faults as well as the TLB misses.
> If the sharp increase in the number of page fault is indeed to be blamed on the SM BTL, this is more than worrisome, as it might in indicate a wrong usage of the reserved memory pages (like a FIFO instead of a LIFO). Can you provide us with more precise information regarding this please.
> Thanks,
> george.
>>> I now have much better deterministic performance in that I no longer see outlier measurements (jobs that usually take 3 ms would sometimes take 15 ms).
>> I'm not sure I grok that; are you benchmarking an entire *job* (i.e., a single "mpirun") that varies between 3 and 15 milliseconds? If so, I'd say that both are pretty darn good, because mpirun invokes a lot of overhead for launching and completing jobs. Furthermore, benchmarking an entire job that lasts significantly less than 1 second is probably not the most stable measurement, regardless of page faults or not -- there's lots of other distributed and OS effects that can cause a jump from 3 to 15 milliseconds.
>>> I did not notice a performance penalty using a network stack.
>> Depends on the app. Some MPI apps are latency bound; some are not.
>> Latency-bound applications will definitely benefit from faster point-to-point performance. Shared memory will definitely have the fastest point-to-point latency compared to any network stack (i.e., hundreds of nanos vs. 1+ micro).
>> --
>> Jeff Squyres
>> jsquyres_at_[hidden]
>> For corporate legal information go to:
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> <ompi-output.tar.bz2>_______________________________________________
> users mailing list
> users_at_[hidden]

Jeff Squyres
For corporate legal information go to:
users mailing list