SQL: Buffer Cache Hit Ratio

Symptoms : No longer caching data pages in memory, reads from disk are higher, possible memory bottleneck – SQL is running slow.

Impact : High

Slow SQL responses will degrade the user experience, resulting in poor efficiency of your organization’s operations.

Expected behavior :

There is no standard metric for buffer cache hit ratio. Our recommended value for it is over 90. When better performance is needed, the minimal acceptable value is 95. A lower value indicates a memory problem. This alert indicates that your threshold levels have been crossed, and needs to be investigated.

Possible causes of persistent low ratio

Insufficient CPU power   Priority : High
Continually high usage rate may indicate the need for a CPU upgrade or the addition of multiple processors. Alternatively, a high CPU usage rate may indicate a poorly tuned application or SQL code. Optimizing the application can lower CPU utilization.
Recommended action :
Use Performance Monitor to check the load on your CPU. Look for the Processor:% Processor Time counter: if it regularly exceeds 80% of the processor time per CPU then most probably you’re facing a CPU related bottleneck.

Insufficient RAM  Priority : High
The more memory your server has, the more cached disk reads and writes it can perform. Lack of system memory can cause high non-cached disk reads and writes. Adding memory to your server can help lower physical disk access.
Recommended action :
Compare ratio of physical RAM to database size – aim for RAM to be equal or more than 20% of the largest database.

Incorrect Buffer Pool allocation.  Priority : High
In SQL Server, data is stored in 8 KB pages. When data is required for input/output the page is first read from the disk and stored into memory. The area allocated is called the Buffer Pool. Once a page is loaded into the pool, all subsequent access is very fast and efficient, whereas disk I/O is relatively slow.
Recommended action :
Allocate an optimal proportion of physical RAM to Buffer Pool, but leave sufficient RAM for other Operating System operations. Set a value in the ‘maximum server memory’ option of the ‘server memory options’ page, that leaves enough memory for the OS itself. We recommend around 4 GB less than the amount of RAM installed.

Possible causes of transient low ratio

Missing or corrupt indexes  Priority : High
Missing indexes means SQL Server is suggesting that your query could run faster with an index. In cases of corruption 99% of the time, corruptions are disk or disk driver/firmware related.
Recommended action :
In case of missing index, refer back to you system design team. In case of corrupted indexes, refer to your storage management team, or our support experts, for help. One possible remedy in cases where the corruption is repeating is to clone the database onto a new hardware configuration.

Queries saturating the Buffer Pool  Priority : Low
These causes of low PLE should be transient, not persisting over long periods. May be due to :

  • getting large memory grants
  • displacing large numbers of pages in memory with new ones
  • modifying many pages, forcing them to be flushed to disk
  • many concurrent (or some very large) queries

Recommended action :
In cases where they are re-occurring or showing tendency to increase in magnitude (lower PLE), investigate the other possible causes below.

DBCC CHECKDB running  Priority : Low
CHECKDB is very resource intensive – probably one of the most resource intensive things SQL Server performs. Introducing this process as additional I/O load on top of normal SQL Server load means that disk heads will be moving constantly.
Recommended action :
We suggest running any essential CHECKDB processes to a time of minimal SQL Server load, or alternatively on backup copies of the databases offline.

Index Rebuilds running  Priority : Low
An index rebuild operation will always build a new index, which means extra space is required to build the new index before dropping the old one; a lot of CPU and I/O resources are required, which can overload the system.
Recommended action :
If it occurs frequently, it can be corrected by changes in the indexing procedure, for example by switching to a strategy in which fragmentation is analyzed every night and only the fragmented indexes are processed. Our SQL Support Center experts are available to analyze and assist.

Background

This metric gives the ratio of the data pages found and read from the SQL Server buffer cache and all data page requests. Pages that are not found in the buffer cache must be read from disk, which is significantly slower and affects performance. Ideally, SQL Server would read all pages from the buffer cache and there will be no need to read any from disk. In this case, the ratio value would be 100.There is a close relationship between this and page life expectancy – see here. Low PLE will always correlate with lower cache hit ratios.