Improve performance in Microsoft Dynamics AX

AimBetter Diagnostics in Action

May, 2018

Introduction

One of our customers has grown used to persistent poor performance when users are working with one particular operation inside their Microsoft Dynamics AX application. In these instances, something like 40 users need to simultaneously draw reports from the database, and the pattern has been for each one to experience wait-times of up to 40 minutes, and the trend has been for these times to be increasing.

Background information

Microsoft Dynamics AX is an ERP package used mainly by small to medium-sized businesses.  There have been other cases reported on their own user group of similar behavior, and how they were resolved (or in some cases, could not be resolved).  The similarity seems to be coming from the widespread use of the Cursor technique (see our technical note #1 at the end of this study) which generally is meant to improve performance, but which in turn makes debugging performance issues such as this much harder.

Summary of this event

  1. Many users in our customer complained of extended wait times in one specific action
  2. Repeated attempts using standard Microsoft analysis tools had not yielded any meaningful results, because of the limitations imposed by the Cursor technique
  3. By running query analyzer (QAnalyzer) feature in the AimBetter monitoring application, the specific instance of the query which was being run by Cursor was identified
  4. QAnalyzer drew attention to the likely cause of performance issues, which was a missing index
  5. The specific plan being executed at the relevant time was examined, showing which index needed to be built
  6. Adding this index brought execution down to less than 1% of previous times
  7. The whole process, from start to completion, took less than an hour!!!

Important to note is that all of this happens in real time on a single console, with AimBetter support and customer’s administration able to see and respond to exactly the same console information, without need for over-the-shoulder or VPN access into the customer’s site.

The Problem

Poor response times were reported from 40+ users who were performing the same action.  This could indicate a general system problem (network, storage or CPU are the main contributors in such a situation) but two things made this unlikely.  Firstly, the fact that this was not a one-off occurrence but had been persisting and growing, made it unlikely that it was an overall environmental problem.  Secondly, this was confirmed when the DBA ran standard performance monitoring tools (Windows PerfMon and others), and there were no abnormal readings.

In our study, we will present two different stories.  In the first, the study looks at how a database administrator would have to try to solve this problem using these Microsoft diagnostic tools and methods.  In fact, this had been the approach of this customer before, and so far no proper solution had been found, which is why they had been growing used to regularly poor performance.  In the second part, we will tell how taking the AimBetter approach, it was possible to find, and fix, this problem in a short time, by combining our advanced tool’s capabilities with our Expert Team’s knowledge of how Dynamics AX works.

The ‘old’ way

One indication of the level of difficulty that administrators and programmers have in managing Dynamics is that Microsoft have themselves recommend the following specialized tools for monitoring the performance of Microsoft Dynamics AX:

  • Microsoft Dynamics AX Trace Parser
  • Performance Analyzer for Microsoft Dynamics AX (DynamicsPerf)
  • System Center Operations Manager Monitoring Pack for Microsoft Dynamics AX
  • Intelligent Data Management Framework for Microsoft Dynamics AX

As well, use of Performance Monitor and Microsoft Visual Studio Profiling Tools are necessary for proper oversight. Clearly, working with Dynamics AX will require a high level of DBA expertise, either in-house or supplied by external resources.

The next available step, to try to understand what SQL Server was doing, was to examine currently running queries via the Microsoft Profiler screen depicted here:

From this, it was understood that the SQL Server stored procedure feature Cursors is being used, and is running the same query over and over. The main problem in understanding this situation is that Profiler can give the DBA a very limited amount of information.  All of the essential details are hidden, because the execution of the Cursor is reported only by showing parameters, not the actual query and relevant data.  Therefore, the actual user id, computer from which it is running and the query itself, and the data it is working on, are hidden inside the Cursor.

The only real method for analyzing Cursor functions is to instigate an instance from a known source (computer, user, query) and to then draw comparisons because in this case the parameters become known.  The problem with this approach is that it becomes a special case, where external factors, such as load on the SQL Server from other users and functions, are now different from the real case, so that meaningful conclusions are hard to get.  This explains why in so many cases, resolution of Cursor-induced problems are put into the “too-hard” basket and companies just get used to the poor performance of the specific function.

The AimBetter way

Starting from the same point, where it can be assumed that the problem is coming from a query and not from other performance issues, the AimBetter monitor can immediately take a picture of running queries is real time, and analyze the whole SQL Server’s performance over specific time slices.

Immediately, it can be seen that one query is consuming a large portion of CPU cycles, and has executed more than two thousand times inside the 5 minute timeframe which QAnalyzer was examining. Further, we have highlighted the AimBetter monitor’s ability to indicate that the query probably has some missing (or damaged) index. This immediately indicates that the execution plan of this query should be analyzed.

Clearly, the plan is identifying that an index – in this specific case a non-Clustered index, is missing, resulting in the query consuming the great bulk of the SQL process.  The component which needs the missing index represents nearly 100% of the total load, and when corrected could increase efficiency by several orders of magnitude.

The AimBetter Solution

Good database administration does not always demand that changing an index is the proper solution, especially if the problem has manifested before. In this case, it required that the actual cause of this problem should be investigated.  Just because you re-build indexes, it does not mean you have the correct ones.  Depending on what you are doing, you may need to add some or could have over-indexed some of the tables so that the writes are taking too long.  From our experience working with other Dynamics customers we had learned that AX queries can use a lot of cursors and sometimes a significant improvement can be achieved by better optimizing the query code.

With the insight gained from the ability in AimBetter to view the actual plan that was running in real time, instead of having to try to reconstruct the problem in a different configuration, we were able to determine that a change in the index was the best solution, and this was immediately applied.

As can be seen from the following snapshot, by having the correct index, performance times were reduced by almost 100%, CPU load came down from 18% to 8%, even though actual transactional throughput had increased by more than 50%:, bringing average execution duration for the same operation down from 40 minutes to just 30 seconds :

Why AimBetter does it better.

AimBetter brings a host of special features and benefits that are especially designed to make the management of SQL databases more efficient, and to simplify the work of database administrators.  The particular features that worked in favor of our customers in this case can be summarized as follows:

  • Enables wide-scope correlation of operational data across the whole environment
  • Monitors core SQL database performance
  • Reports on SQL Exceptions
  • Alerts of any unexpected behavior
  • Enables comparison of current with historical metrics
  • Displays in a single comprehensive dashboard
  • Displays all operational metrics on one screen
  • Identifies all current queries elements
  • Allows query selection on specific criteria
  • Displays all live Queries statistics
  • Historical display of queries
  • Retrieves full query syntax
  • Retrieves query plan

Putting all these features into a single package and then having the results at the fingertips of both our experts and the customers’ own database managers was the key element that allowed for this happy ending.

Technical note 1 – About SQL Cursor

T-SQL is optimized to work in an ‘all-or-nothing’ manner. Cursors break away from the transactional nature of T-SQL and allow for the program to treat each result of a SELECT statement in a certain way by looping through them. However, there are some situations in which cursors can be the proper solution. Sometimes, in OnLine Transaction Processing (OTLP)  systems, there are too many users actively concurrently querying a specific table. Small transactions do not require locks on large chunks of the underlying tables, so the result is generally fast. But in order to be able to update the entire table SQL Server often has to create a huge lock that blocks all other activity on the same table. This is in order to protect data consistency, otherwise a concurrent use could come along and SELECT rows from the table that are only partially updated. SQL Server cursors (or WHILE loops) can break full table updates into multiple smaller batches. This does come at a cost, which is a possible degradation in overall performance but in some cases, it becomes perfectly acceptable to sacrifice performance to avoid blocking the system.

The main problem in using Cursor is not in its performance, but in the way that it executes multiple operations as a single batch, which prevents study of what is happening “under the hood.

The typical limited level of information that can be seen from standard Microsoft analysis is seen in this snapshot: