A DBA guide to SQL Server performance troubleshooting – Part 1 – Problems and performance metrics

 

Monitoring SQL Server performance is a complex task, as performance depends on many parameters, both hardware and software. It’s recommended to monitor these parameters proactively in order to prevent any potential performance degradation. However, this is not always the case. If the performance degradation happens, the first step towards the resolution is to determine the problem, find out the cause, and resolve the issue. Severe performance issues require quick actions and resolutions. Less severe problems can have a delayed fix. In either case, DBAs should take actions to prevent these situations in future

 

It doesn’t mean that every SQL Server slowdown is a performance problem. Some specific actions require many resources and put much stress on the server. If these actions are occasional, they should not be considered as a serious threat and can be ignored. Running your SQL Server under stress all the time should be thoroughly analyzed. It is good to know how the system behaves when there are no performance problems. This can be obtained by collecting baseline information about resource using

To find out what’s happening with your SQL Server, start with monitoring databases, watch the performance metrics over time to create baselines and trend lines for normal operation, isolate the processes that use a lot of resources. Then you will be able to debug and repair the issues

Diagnose problems

To start troubleshooting, you have to define the symptoms first. The most common SQL Server performance symptoms are CPU, memory, network, and I/O bottlenecks, and slow running queries

CPU bottlenecks are caused by insufficient hardware resources. Troubleshooting starts with identifying the biggest CPU resource users. Occasional peaks in processor usage can be ignored, but if the processor is constantly under pressure, investigation is needed. Adding additional processors or using a more powerful one might not fix the problem, as badly designed processes can always use all CPU time. Query tuning, improving execution plans, and system reconfiguration can help. To avoid bottlenecks, it’s recommended to have a dedicated server that will run only SQL Server, and to remove all other software to another machine

Memory bottlenecks can result in slow application responsiveness, overall system slowdown, or even application crashing. It’s recommended to identify when the system runs with insufficient memory, what applications use most of memory resources, whether there are bottlenecks for other system resources. Reviewing and tuning queries, memory reconfiguration, and adding more physical memory can help

Network bottlenecks might not be instantly recognized, as they can at a first glance be considered as SQL Server performance issues caused by other resources. For example, a delay of data sent over a network can look like SQL Server slow response

I/O bottlenecks are caused by excessive reading and writing of database pages from and onto disk. A bottleneck is manifested through long response times, application slowdowns and tasks time-outs. If other applications use disk resources excessively, SQL Server might not get enough disk resources for its normal operation and would have to wait to be able to read and write to disk

Slow running queries can be a result of missing indexes, poor execution plans, bad application and schema design, etc.

Common causes of performance issues

The causes for performance problems can be various, but the most common are a poorly designed database, incorrectly configured system, insufficient disk space or other system resources, excessive query compilation and recompilation, bad execution plans due to missing or outdated statistics, and queries or stored procedures that have long execution times due to improper design

Memory bottlenecks are caused by limitations in available memory and memory pressure caused by SQL Server, system, or other application activity. Poor indexing requires table scans which in case of large tables means that a large number of rows is read from disk and handled in memory

Network bottlenecks are caused by overload on a server or network, so the data cannot flow as expected

I/O issues can be caused by slow hardware used, bad storage solution design, and configuration. Besides hardware components, such as disk types, disk array type, and RAID configuration that affect I/O performance, unnecessary requests made by a database also affect I/O traffic. Frequent index scans, inefficient queries, and out of date statistics can also cause I/O workload and bottlenecks

What metrics to monitor

To start troubleshooting the most common CPU performance issues, monitor % Processor Time. This counter is available in Performance Monitor. If its value is constantly higher than 80%, the processor is under pressure

The counters that indicate most common causes for processor pressure are Batch Requests/sec, SQL Compilations/sec, and SQL Recompilations/sec. These counters are available in Performance Monitor and in the sys.dm_os_performance_counters view

 

 

Dialog showing values for the most commonly used processor pressure counters

Just note that the counter type for all three counters is 272696576 and that the values shown are cumulative since the last SQL Server start, so they have to be calculated. One of the methods is to take two samples with a 10-second delay

 

 

The Batch Requests/sec value depends on hardware used, but it should be under 1000. The recommended value for SQL Compilations/sec is less than 10% of Batch Requests/sec and for SQL Re-Compilations/sec is less than 10% of SQL Compilations/sec

For memory problems, monitor the Memory Available KB performance counter. The normal values should be over 200 MB. If the value of Memory Available KB counter is lower than 100 MB for long time, it’s a clear indication of insufficient memory on the server. This counter is available in Performance Monitor, and two more useful counters – Total Server Memory (KB) and Target Server Memory (KB) are available via the sys.dm_os_performance_counters view

Another counter to monitor is Pages/sec, it is available in Performance Monitor. It shows the rate at which the pages are written from disk to RAM and read from RAM to disk. The values higher than 50 show intensive memory activity and possible overhead and memory pressure that can lead to SQL Server performance degradation

Checkpoint pages/sec and Lazy writes/sec indicate whether dirty pages are flushed to disk too often. Dirty pages are automatically flushed to disk at a checkpoint. If the available free space in the buffer cache between two checkpoints is low, a lazy write will occur to flush the pages from buffer to disk and free up memory. The Lazy Writes/sec value should be below 20. Both counters are available in Performance Monitor and the sys.dm_os_performance_counters view, but as the counter type is 272696576, the values returned by the view should be calculated

If the Lazy Writes/sec value is constantly above the threshold, check the Page Life Expectancy value. Values below 300 seconds indicate memory pressure. The counter is available in Performance Monitor and the sys.dm_os_performance_counters view, no additional calculation is needed

Buffer Cache Hit Ratio shows the ratio of the data pages found and read from the SQL Server buffer cache and all data page requests. If a page doesn’t exist in the buffer cache, it has to be read disk, which downgrades performance. The recommended value is over 90. The counter is available in Performance Monitor and the sys.dm_os_performance_counters view

As the counter type is 537003264, the value returned by the view has to be calculated to get the current value. To do that, it’s necessary to use the Buffer Cache Hit Ratio Base value also

Buffer Cache Hit Ratio % = 100 * Buffer Cache Hit Ratio / Buffer Cache Hit Ratio Base
                                        = 100 * 1,797 / 1,975
                                        = 90.98%

Troubleshooting network problems should start with finding queries, functions, and stored procedures that have slow response time. If they are executed quickly, but with a large delay between two calls, it can be an indication of a network issue. SQL Server Profiler can be used to determine which queries, functions and stored procedures were executed

For I/O problems, monitor disk-related counters: Average Disk Queue Length, Average Disk Sec/Read, Average Disk Sec/Write, %Disk Time, Average Disk Reads/Sec, and Average Disk Writes/Sec. All counters are available in Performance Monitor

Average Disk Queue Length shows the average number of I/O operations that are waiting to be written to or read from disk and the number of currently processed reads and writes. The recommended value is below 2 per individual disk, and higher values indicate I/O bottlenecks

Average Disk Sec/Read shows the average time in seconds needed to read data from disk. The recommended values are given for categories, where under 8ms is excellent performance, and higher than 20ms is a serious I/O issue

Average Disk Sec/Write shows the average time in seconds needed to write data to disk. The performance is excellent if the value is below 1ms and bad if the counter value is higher than 4ms

Average Disk Reads/Sec and Average Disk Writes/Sec show the rate of read and write operations on disk, respectively. Low values indicate slow disk I/O processing, and checking processor usage and disk-expensive queries is recommended. The normal values depend on disk specification and server configuration. These counters don’t have a specific threshold, so it’s recommended to monitor these metrics for a while and to determine trends and set a baseline

SQL Server performance can be affected by many factors. When troubleshooting problems, it’s necessary to know where to start, to know the normal values for the performance counters and to select a tool that will provide sufficient information for analyzing and troubleshooting the issues

Resources
Troubleshooting Performance Problems in SQL Server 2008 whitepaper
Five Things That Fix Bad SQL Server Performance
How to troubleshoot SQL Server performance issues
SQL Server Performance Troubleshooting for Dummies – part 1
SQL Server Performance Tuning

Fonte: https://www.sqlshack.com
Author: Milena Petrovic

 



To Top