Cache Hit Ratio alert – SQL is running slow.
Impact : Severe
Slow SQL responses will degrade the user experience, resulting in poor efficiency of your organization’s operations.
Expected behavior :
There is no standard metric for buffer cache hit ratio. Our recommended value for it is over 90. When better performance is needed, the minimal acceptable value is 95. A lower value indicates a memory problem. This alert indicates that your threshold levels have been crossed, and needs to be investigated.
Possible causes of persistent low ratio
Continually high usage rate may indicate the need for a CPU upgrade or the addition of multiple processors. Alternatively, a high CPU usage rate may indicate a poorly tuned application or SQL code. Optimizing the application can lower CPU utilization.
Use Performance Monitor to check the load on your CPU. Look for the Processor:% Processor Time counter: if it regularly exceeds 80% of the processor time per CPU then most probably you’re facing a CPU related bottleneck.
The more memory your server has, the more cached disk reads and writes it can perform. Lack of system memory can cause high non-cached disk reads and writes. Adding memory to your server can help lower physical disk access.
Compare ratio of physical RAM to database size – aim for RAM to be equal or more than 20% of the larges database.
In SQL Server, data is stored in 8 KB pages. When data is required for input/output the page is first read from the disk and stored into memory. The area allocated is called the Buffer Pool. Once a page is loaded into the pool, all subsequent access is very fast and efficient, whereas disk I/O is relatively slow.
Allocate an optimal proportion of physical RAM to BP, but leave sufficient RAM for other Operating System operations. Set a value in the Maximum server memory option of the ‘Server memory options’ page, that leaves enough memory for the OS itself. We recommend around 4 GB less than the amount of RAM installed.
Possible causes of transient low ratio
Missing indexes means SQL Server is suggesting that your query could run faster with an index. In cases of corruption 99% of the time, corruptions are disk or disk driver/firmware related.
In case of missing index, refer back to you system design team. In case of corrupted indexes, refer to your storage management team, or our support experts, for help. One possible remedy in cases where the corruption is repeating is to clone the database onto a new hardware configuration.
These causes of low PLE should be transient, not persisting over long periods. May be due to :
- getting large memory grants
- displacing large numbers of pages in memory with new ones
- modifying many pages, forcing them to be flushed to disk
- many concurrent (or some very large) queries
In cases where they are re-occurring or showing tendency to increase in magnitude (lower PLE), investigate the other possible causes below.
CHECKDB is very resource intensive – probably one of the most resource intensive things SQL Server performs. Introducing this process as additional IO load on top of normal SQL Server load means that disk heads will moving constantly.
We suggest running any essential CHECKDB processes to a time of minimal SQL Server load, or alternatively on backup copies of the databases offline.
An index rebuild operation will always build a new index, which means extra space is required to build the new index before dropping the old one; a lot of CPU and I/O resources are required, which can overload the system.
If it occurs frequently, it can be corrected by changes in the indexing procedure, for example by switching to a strategy in which fragmentation is analyzed every night and only the fragmented indexes are processed. Our SQL Support Center experts are available to analyze and assist.
This metric gives the ratio of the data pages found and read from the SQL Server buffer cache and all data page requests.
Pages that are not found in the buffer cache must be read from disk, which is significantly slower and affects performance
Ideally, SQL Server would read all pages from the buffer cache and there will be no need to read any from disk. In this case, the ratio value would be 100.There is a close relationship between this and page life expectancy – see here. Low PLE will always correlate with lower cache hit ratios.