Change Tracking causes high CPU usage

AimBetter Diagnostics in Action

April, 2018

Introduction

A customer recently encountered a critical issue when AimBetter Alerts monitor reported a sudden and inexplicable spike in CPU load.  This alert notification both on the AimBetter dashboard and via email to the customer’s database administrators triggered an investigation that quickly lead to identification of the cause, enabling steps to be taken in real-time that avoided the potential impact that overloading CPU can bring.

Background information

SQL Server is the engine that drives an enterprise’s data management, organization and usage.  As such, it is one of the most important components of their control, the equivalent of a train’s locomotive or a passenger plane’s jet engines.  Raw power is important, but without proper control, no meaningful progress is possible.  However, there are some significant differences between database supervision and the other examples we cited.  For a jet engine, there is a clearly defined set of load expectations, ranging from maximum thrust on takeoff, through steady power while cruising, which facilitates flying on auto-pilot for virtually all the journey.  Similarly, a locomotive is running along a track where the gradients and turns have been carefully computed to comply with the known capacity of any locomotive.  In both cases, capacity, stress and load are constant or predictable. In the case of database management, the only given constant is capacity.  Load, and the rate of change, are completely unpredictable and best practice is to have strong monitoring capabilities so that any pattern of behavior that appears to be a sign of approaching unexpected stress or load can be identified early enough for proper remedial action to be taken.

Summary of this event

  1. AimBetter Monitor Dashboard raised an alert that the customer’s CPU was undergoing rapid stress and had passed all threshold alert levels with a few minutes.
  2. With use of the Operational metrics analyzer, the starting point of the events that triggered the CPU stress was identified, allowing immediate investigation of events from that time.
  3. Moving over to Query display, since the point when the problem first manifested is now known, the specific query was identified and could be immediately examined and analyzed.
  4. QAnalyzer showed a large number of instances of the same query running simultaneously. (We could conclude that this was probably the result of resumption of activity, immediately following conclusion of a pause in work for the national Holocaust Remembrance Day, a few minutes before.)
  5. By opening one instance of these multiple queries, the nature of the call was evident, showing that it was using a specific SQL Server feature – Change Tracking.
  6. We could recognize that the way Change Tracking works in an instance such as this, with multiple changes happening simultaneously to the same table, was the root cause of the problem.
  7. AimBetter’s Expert Support team was able to advise of an “on-the-fly” solution that could restore normal level of CPU load without restarting the whole server, which was the likely course that the customer would have taken without the deep insight that we are able to provide.
  8. By running two procedures, to temporarily suspend Change Tracking and then re-instate it, normal load levels were restored and the whole environment was stabilized.
  9. The whole process, from start to completion, took less than 15 minutes!!!

Important to note is that all of this happens in real time on a single console, with AimBetter support and customer’s administration able to see and respond to exactly the same console information, without need for over-the-shoulder or VPN access into the customer’s site.

The symptoms

AimBetter’s dashboard monitor alerted us that CPU load had rapidly exceeded the customer’s preset threshold levels.  This was matched by an email drawing attention of the customer’s database administrators to this situation.

The investigation

In order to have a clearer picture of the operational metrics relating to the alert, a one-click switch can bring up all the relevant environmental readings covering the whole operating system in real-time.

There is clearly some sudden change that has been triggered shortly before 8:00 PM on the day in question. One additional factor to be taken into account in the investigation was that the spike coincided exactly with the termination of the national Holocaust Memorial Day, which finished at 7:30PM on April 12, indicating some possible switch between different levels of activity.

From our knowledge of performance patterns in SQL, the most likely cause of such a rapid and substantial increase in load is a query, and so we went directly on to the next screen, which shows an analysis of currently running queries. (The readings reported on screen can be sorted into ascending/descending order, which immediately brought the most active queries ranked by CPU load.)

It is immediately obvious from this picture that multiple instances of one query (CHANGETABLE] have been running with high CPU demand throughout this small sample period (1 minute).  Further analysis revealed that there were 40 simultaneous calls of exactly the same query.  A picture of this query is seen in the next snapshot.

The syntax of the call is as follows:

SELECT tbl.* FROM CHANGETABLE(CHANGES ApprovedSites, 8219) AS ct

INNER JOIN amtApprovedSites AS tbl ON tbl.CID = ct.CID AND tbl.SID = ct.SID

The significance of having multiple simultaneous queries performing exactly the same functions, all fired within a short time window, is further exacerbated by the way that this customer has implemented the SQL Server technique of using the Change Tracking feature (see further explanation of Change Tracking in Technical Note 1 below).

It is important when dealing with Change Tracking mechanism to understand the sequence of any following calls because of the specific nature of the Change Tracking technique.  To provide this information, we displayed one further screen which opens up the actual call.

In such a case, because they are happening simultaneously each call is performing a full lookup on the table before performing any following Update/Insert/Delete instructions. Every SELECT call with Change Tracking enabled returns a reading of the last updated record in the table.  However, the nature of Change Tracking limits the information being returned to simply providing the result of the last change, rather than the details of the change itself (in other words, there is no information about the state of the data before and after the change). In this case, since the same table is being updated with an INSERT action in 40 separate instances of the query, the “last changed” table entry in the Change Tracking table is always out of phase, forcing the queries to read substantially higher numbers of records on each call.

The cure

In view of the critical nature of this situation, an “on-the-fly” solution was required in order to bring the CPU load down to manageable levels before trying to fully understand the deeper causes.  In order to achieve this without completely stopping and restarting the whole server, we devised an executable fix that allowed the running queries to by-pass what in effect is a bottleneck cause by Change Table functionality itself.

Step 1: Run a script to temporarily disable Change Tracking.

    • USE [Server]
    • GO
    • BEGIN TRAN
    • SELECT TOP 1 * FROM [ApprovedSites] WITH (XLOCK, TABLOCK)ALTER TABLE [dbo]. [ApprovedSites] DISABLE CHANGE_TRACKING
  • COMMIT

 

As soon as the CPU levels reset to normal load.

 

Step 2: Run a script to re-instate Change Tracking, with the following form:

    • USE [Server]
    • GO
    • BEGIN TRAN
    • SELECT TOP 1 * FROM [ApprovedSites] WITH (XLOCK, TABLOCK)ALTER TABLE [dbo]. [ApprovedSites] ENABLE CHANGE_TRACKING
    COMMIT

This simple solution brought immediate to relief to what was potentially a serious crisis for the customer.  The whole process was made simple by the speed and flexibility of AimBetter’s monitor to, firstly, raise the alert as soon as the problem was manifesting, then to enable both the customer and our Expert Support team members to track directly into the elements that were the root cause of the behavior, and lastly for the customer to be able to apply a simple remedy whose effects were immediately visible.

Why AimBetter does it better.

AimBetter brings a host of special features and benefits that are especially designed to make the management of SQL databases more efficient, and to simplify the work of database administrators.  The particular features that worked in favor of our customers in this case can be summarized as follows:

  • Enables wide-scope correlation of operational data across the whole environment
  • Monitors core SQL database performance
  • Reports on SQL Exceptions
  • Alerts of any unexpected behavior
  • Enables comparison of current with historical metrics
  • Displays in a single comprehensive dashboard
  • Displays all operational metrics on one screen
  • Identifies all current queries elements
  • Allows query selection on specific criteria
  • Displays all live Queries statistics
  • Historical display of queries
  • Retrieves full query syntax
  • Retrieves query plan

Putting all these features into a single package and then having the results at the fingertips of both our experts and the customers’ own database managers was the key element that allowed for this happy ending.

Technical note 1 – About Change Tracking

Change Tracking is a light-weight feature which provides a synchronization mechanism between two applications. In other words, it tracks changes (INSERT, UPDATE and DELETE) that occur on a table, so that an application (like a caching application) can refresh itself with just the changed dataset.

Change Tracking is a feature introduced in SQL Server 2008, which allows an application to refresh itself with the latest/changed data from other sources. In other words, an application can retrieve only data which was changed since the last pull. When you enable change tracking for a table, an internal table is created by SQL Server to store the change information. This table is built using a unique primary key, which means that only each subsequent change using the same key will over-write the prior change, meaning that there is no history of changes for that key.

Summary

SQL Server Change Tracking is a very simple tracking tool which has limited use and there is no direct way of querying the internal tracking table that is created when Change Tracking is enabled. You can define the retention period of Change Tracking data at the database level. Only columns that are part of the primary key in a table are tracked for changes after enabling Change Tracking.

The function CHANGE_TRACKING_CURRENT_VERSION () only returns the current version of the row. As well, there is no way of changing the behavior of this feature. Its behavior is ‘take it or leave it’.

NOTE: Change Tracking will maintain only the net change information, which means if a record is changed more than once the tracking table will have information about the result of the last change only.