Standard Linux Tuning
Hello Bloggers,
Majority of the applications these days are deployed on (Debian / Redhat) Linux Operating System as the Base OS.
I Would like to share some generic tuning that can be done before deploying any application on it.
Index | Component | Question / Test / Reason | ||
Network | ||||
These are some checks to validate the network setup. | ||||
[ | Network | Are the switches redundant? Unplug one switch. Fault-tolerance. |
||
Network | Is the cabling redundant? Pull cables. Fault-tolerance. |
|||
Network | Is the network full-duplex? Double check setup. Performance. |
|||
Network adapter (NIC) Tuning | ||||
It is recommended to consult with the network adapter provider on recommended Linux TCP/IP settings for optimal performance and stability on Linux.
There are also quite a few TCP/IP tuning source on the Internet such as http://fasterdata.es.net/TCP-tuning/linux.html |
||||
NIC | Are the NIC fault-tolerant (aka. auto-port negotiation)? Pull cables and/or disable network adapter. Fault-tolerance. |
|||
NIC | Set the transmission queue depth to at least 1000.
txqueuelen <length>
Performance and stability (packet drops). |
|||
NIC | Enable TCP/IP offloading (aka. Generic Segment Offloading (GSO)) which was added in kernel 2.6.18
See: http://www.linuxfoundation.org/en/Net:GSO
lsb Note: I recommend enabling all supported TCP/IP offloading capabilities on and EMS host to free CPU resources. |
|||
NIC | Enable Interrupt Coalescence (aka. Interrupt Moderation or Interrupt Blanking).
See: http://kb.pert.geant.net/PERTKB/InterruptCoalescence
Performance. Note: The configuration is system dependant but the goal is to reduce the number of interrupts per second at the ‘cost’ of slightly increased latency. |
|||
TCP/IP Buffer Tuning | ||||
For a low latency or high throughput messaging system TCP/IP buffer tuning is important. Thus instead of tuning the defaults values one should rather check if the settings (sysctl –a) provide large enough buffer The values can be changed via the command sysctrl –w <name> <value>. The below values and comments were taken from TIBCO support FAQ1-6YOAA) and serve as a guideline towards “large enough” buffers, i.e. if your system configuration has lower values it is suggested to raise them to below values. |
||||
TCP/IP | Maximum OS receive buffer size for all connection types.
Default: 131071 |
|||
TCP/IP | Default OS receive buffer size for all connection types.
Default: 126976 |
|||
TCP/IP | Maximum OS send buffer size for all connection types.
Default: 131071 |
|||
TCP/IP | Default OS send buffer size for all types of connections.
Default: 126976 |
|||
TCP/IP | Enable/Disable TCP/IP window scaling enabled?
Default: 1 Performance. |
|||
TCP/IP | TCP auto-tuning setting:
Default: 1966087 262144 393216 The tcp_mem variable defines how the TCP stack should behave when it comes to memory usage: – The first value specified in the tcp_mem variable tells the kernel the low threshold. Below this point, the TCP stack does not bother at all about putting any pressure on the memory usage by different TCP sockets. – The second value tells the kernel at which point to start pressuring memory usage down. – The final value tells the kernel how many memory pages it may use maximally. If this value is reached, TCP streams and packets start getting dropped until we reach a lower memory usage again. This value includes all TCP sockets currently in use. |
|||
TCP/IP | TCP auto-tuning (receive) setting:
Default: 4096 87380 4194304 The tcp_rmem variable defines how the TCP stack should behave when it comes to memory usage: – The first value tells the kernel the minimum receive buffer for each TCP connection, and this buffer is always allocated to a TCP socket, even under high pressure on the system. – The second value specified tells the kernel the default receive buffer allocated for each TCP socket. This value overrides the /proc/sys/net/core/rmem_default value used by other protocols. – The third and last value specified in this variable specifies the maximum receive buffer that can be allocated for a TCP socket.” |
|||
TCP/IP | TCP auto-tuning (send) setting:
Default: 4096 87380 4194304 This variable takes three different values which hold information on how much TCP send buffer memory space each TCP socket has to use. Every TCP socket has this much buffer space to use before the buffer is filled up. Each of the three values are used under different conditions: – The first value in this variable tells the minimum TCP send buffer space available for a single TCP socket. – The second value in the variable tells us the default buffer space allowed for a single TCP socket to use. – The third value tells the kernel the maximum TCP send buffer space.” |
|||
TCP/IP | This will ensure that immediately subsequent connections use these values.
|
|||
TCP Keep Alive | ||||
In order to detect ungracefully closed sockets either the TCP keep-alive comes into play or the EMS client-server heartbeat. Which setup or which combination of parameters works better depends on the requirements and test scenarios.
As the EMS daemon does not explicitly enables TCP keep-alive on sockets the TCP keep-alive setting (net.ipv4.tcp_keepalive_intvl, net.ipv4.tcp_keepalive_probes, net.ipv4.tcp_keepalive_time) do not play a role. |
||||
TCP | How may times to retry before killing alive TCP connection. RFC1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30 minutes depending on retransmission timeout (RTO).
Default: 15 Fault-Tolerance (EMS failover) The default (15) is often considered too high and a value of 3 is often felt as too ‘edgy’ thus customer testing should establish a good value in the range between 4 and 10. |
|||
Linux System Settings | ||||
System limits (ulimit) are used to establish boundaries for resource utilization by individual processes and thus protect the system and other processes. A too high or unlimited value provides zero protection but a too low value could hinder growth or cause premature errors. | ||||
Linux | Is the number of file descriptor at least 4096
Scalability Note: It is expected that the number of connected clients and thus the number of connections is going to increase over time and this setting allows for greater growth and also provides a greater safety room should some application have a connection leak. Also note that the number of open connection can decrease system performance due to the way the OS handles the select() API. Thus care should be taken if the number of connected clients increases over time that all SLA are still met. |
|||
Linux | Limit maximum file size for EMS to 2/5 of the disk space if the disk space is shared between EMS servers.
Robustness: Contain the damage of a very large backlog. |
|||
Linux | Consider limiting the maximum data segment size for EMS daemons in order to avoid one EMS monopolizing all available memory.
Robustness: Contain the damage of a very large backlog. Note: It should be tested if such a limit operates well with (triggers) the EMS reserved memory mode. |
|||
Linux | Limit number of child processes to X to contain rouge application (shell bomb)
Robustness: Contain the damage a rogue application can do. This is just an example of a Linux system setting that is unrelated to TIBCO products. It is recommended to consult with Linux experts for recommended settings. |
|||
Linux Virtual Memory Management | ||||
There are a couple of virtual memory related setting that play a role on how likely Linux swaps out memory pages and how Linux reacts to out-of-memory conditions. Both aspects are not important under “normal” operation conditions but are very important under memory pressure and thus the system’s stability under stress.
A server running EAI software and even more a server running a messaging server like EMS should rarely have to resort to swap space for obvious performance reasons. However considerations due to malloc/sbrk high-water-mark behavior, the behavior of the different over-commit strategies and the price of storage lead to above recommendation: Even with below tuning of EMS server towards larger malloc regions[1] the reality is that the EMS daemon is still subject to the sbrk() high-water-mark and is potentially allocation a lot of memory pages that could be swapped out without impacting performance. Of course the EMS server instance must eventually be bounced but the recommendation in this section aim to provide operations with a larger window to schedule the maintenance.
As theses values operate as a bundle they must be changed together or any variation must be well understood. |
||||
Linux | Swap-Space: 1.5 to 2x the physical RAM (24-32 GB )
Logical-Partition: One of the first ones but after the EMS disk storage and application checkpoint files. Physical-Partition: Use a different physical partition than the one used for storage files, logging or application checkpoints to avoid competing disk IO.
|
|||
Linux | Committing virtual memory:
$ cat /proc/sys/vm/overcommit_memory Default: 0 Robustness
Note: The recommended setting uses a new heuristic that only commits as much memory as available, where available is defined as swap-space plus a portion of RAM. The portion of RAM is defined in the overcommit_ratio. See also: http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting and http://www.centos.org/docs/5/html/5.2/Deployment_Guide/s3-proc-sys-vm.html |
|||
Linux | Committing virtual memory II:
$ cat /proc/sys/vm/overcommit_ratio Default: 50 Robustness Note: This value specifies how much percent of the RAM Linux will add to the swap space in order to calculate the “available” memory. The more the swap space exceeds the physical RAM the lower values might be chosen. See also: http://www.linuxinsight.com/proc_sys_vm_overcommit_ratio.html |
|||
Linux | Swappiness
$ cat /proc/sys/vm/swappiness Robustness
Note: The swappiness defines how likely memory pages will be swapped in order to make room for the file buffer cache.
Generally speaking an enterprise server should not need to swap out pages in order to make room for the file buffer cache or other processes which would favor a setting of 0.
On the other hand it is likely that applications have at least some memory pages that almost never get referenced again and swapping them out is a good thing. |
|||
Linux | Exclude essential processes (Application) from being killed by the out-of-memory (OOM) daemon.
Echo “-17: > /proc/<pid>/oom_adj Default: NA Robustness See: http://linux-mm.org/OOM_Killer and http://lwn.net/Articles/317814/
Note: With any configuration but overcommit_memory=2 and overcommit_ratio=0 the Linux Virtual Memory Management can commit more memory than available. If then the memory must be provided Linux engages the out-of-memory kill daemon to kill process based on “badness”. In order to exclude essential processes from being killed one can set their oom_adj to -17. |
|||
Linux 32bit | Low memory area – 32bit Linux only
(NOT APPLICABLE)
See: http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00062.html |
|||
Linux CPU Tuning (Processor Binding & Priorities) | ||||
This level of tuning is seldom required for Any Application solution. The tuning options are mentioned in case there is a need to go an extra mile. | ||||
Linux | IRQ-Binding
Recommendation: Leave default The default on Linux is IRQ balancing across multiple CPU and Linux offers two solutions in that real (kernel and daemon) of which only one should be enabled at most. |
|||
Linux | Process Base Priority Recommendation: Leave default
Note: The process base priority is determined by the user running the process instance and thus running processes as root (chown and set sticky bit) increases the processes base priority. And a root user can further increase the priority of Application to real-time scheduling which can further improve performance particularly in terms of jitter. However in 2008 we observed that doing so actually decreased the performance of EMS in terms of number of messages per second. That issue was researched with Novell at that time but I am not sure of its outcome. |
|||
Linux | Foreground and Background Processes
Recommendation: TBD
Note: Linux assigns foreground processes a better base priority than background processes but if it really matters and if so then how to change start-up scripts is a to-be-determined. |
|||
Linux | Processor Set
Recommendation: Don’t bother
Note: Linux allows defining a processor set and limiting a process to only use cores from that processor set. This can be used to increase cache hits and cap the CPU resource for a particular process instance. |
If larger memory regions are allocated the malloc() in the Linux glibc library uses mmap() instead of sbrk() to provide the memory pages to the process.
The memory mapped files (mmap()) are better in the way how they release memory back to the OS and thus the high-water-mark effect is avoided for these regions.