Make the best possible decisions even when the unexpected happens
by Henry Kaye | July 9, 2018
Everyone is familiar with the saying “prevention is better than cure” and it is fortunate that in the world of data management we have developed the necessary tools to be able to observe and monitor even this highly complex environment in order to anticipate negative outcomes and take remedial steps before disaster steps in. A situation arose for one customer which could have resulted in catastrophic failure of the whole data infrastructure, but thanks to our solution’s facility to immediately raise an alarm and alert us and the customer to the growing threat, rapid steps were taken and normal operations restored within a short time.
It is an inescapable feature of the data backup process that all on-going updating of the database with concurrent transactions or changes has to be stored in a separate transaction log while the source data is being copied into the backup media. Once backup has completed, the transactions are read sequentially and the database is updated accordingly. Obviously, the longer the backup is taking, the greater will be the growth of the transaction log and there is a danger that over-running the capacity of the transaction log storage devices could result in irrecoverable loss of data.
For this reason AimBetter monitors growth of the transaction log file and immediately raises an alert if the file size is increasing beyond the threshold limits that have been set.
The AimBetter way
AimBetter monitors hundreds of processes both inside the SQL engine and in the operating environment in real-time and will detect the occurrence of any condition where a metric returns a value that crosses a threshold. In the case we are detailing here, within seconds of the unforeseen growth in the transaction log’s size, the dashboard showed an alert both for our Expert Support team’s attention as well as issuing an email alert for the attention of the database administrators. The dashboard screen looks like this:
With a one-click move onto the Performance page, you can see in the following image that over a period of one hour the log size has increased significantly (more than 50 gigabytes). A further concern is that the log is being cached inside memory – adding to the possibility of catastrophic data loss in the event of a system halt.
After checking, it was found that the backup of the database took nearly 90 minutes, as pictured below, even though the actual size of the database being backed up is no larger than normal. In itself this points to a problem inside the environment that requires action by technical support.
During a backup, all changes, inserts and deletes of data are recorded in the log file, pending completion of the backup. Therefore, as long as the backup was running, there was a significant increase in the file size. Note that the other instances of the backups to the same database had taken relatively little time compared to this event that took about an hour and a quarter. Note also that the log file is larger than the data file, indicating that there are many changes in the database.
One of the most significant advantages of the AimBetter approach is that historical data for up to 30 days is available to enable comparative behavior to be examined, making detection of exceptions much easier and more reliable. If you look back a day, you can see that the use of a blog was 2 gigabytes, compared to 6 gigabytes, which may be part of the problem. By reviewing past behavior, we could determine that there was a similar problem with updating in the past, where this customer was loading more data than usual.
In the final step, examining the actual backup call showed that the backup type in this case was snapshot, which by design is meant to run extremely quickly – and in all other instances had completed in two minutes or less, compared to almost 90 minutes in this case. Our advice was for the customer to involve their own Microsoft support team to determine why the snapshot took such a long time to complete, and in the interim to terminate the backup and restart using a different device. In this case, the backup ran as normal, and the log file was quickly restored to normal size.
In just a few steps, AimBetter was able to alert this customer to the occurrence of a show-stopping problem, and produced a positive outcome. AimBetter works with the most comprehensive range of database metrics and does extensive analysis, but presents simple visual output that presents the complete picture and enables logical steps to be followed for detection and solution of any problems. Even without extensive training in database administration, IT personnel can get a handle on all performance issues from a single monitor screen, and concentrate on issues needing attention.
How AimBetter delivers.
AimBetter extends control from its central dashboard over every server under your control from a single screen. In our central agent we collect data from over 400 primary performance metrics and analyze the results to detect any abnormal patterns. There is only one place that needs to be observed, and only exceptional behavior, in the form or alerts, needs attention. No longer do managers have to spend valuable time looking through masses of raw information in order to detect things that need action.
AimBetter allows you to concentrate on the important issues without making you search through multiple screens and reports – alerting you to real issues in real-time.
The particular features that worked here can be summarized as follows:
- Immediately raised alerts to the growth in the transaction log file
- Pointed to the fact that backup was running overtime
- Identified the exact problem for action by third-party specialists
In general, AimBetter works for you by:
- Monitoring core SQL database performance and CPU, memory, storage and network behavior
- Reporting immediately on exceptional readings
- Enabling comparison of current with historical metrics
- Displaying it all in a single comprehensive dashboard
AimBetter is the best tool in the market that helps you handle malfunctioning code and doesn’t require you to use complex procedures in order to troubleshoot. Additionally it monitors all of your database operations for you automatically, therefore it saves you a lot of time and effort and is the best solution for your Db management.