Imagine suffering from recurring downtimes? This is a nightmare for any business, especially if you have hundreds of employees and thousands of clients. In such a case, a minute of downtime costs more than $5.000 on average and can cost much more considering collateral effects such as reputation loss and competition advances.
A prominent motor vehicle importer with about 800 employees and 70 stores and service centers attending thousands of customers daily faced recurring challenges with intermittent disruptions to their CRM system, causing hours of outage to all the company’s users.
This company’s CRM system runs on its central production server, which is a robust one with over three terabytes of storage, 70 databases, and thousands of tables. The company maintained a synchronized secondary server using SQL Server’s Always-On feature to minimize risks of data loss, corruption, and downtime. However, a critical event unfolded every few weeks where the CRM malfunctioned, triggering the resolution process in the Always-On mechanism and rendering the secondary server unavailable. The consequence was that both principal and secondary servers were down, causing the CRM to be completely inaccessible.
As a core system for the company, the CRM needed swift restoration during these events. Unfortunately, the IT Management faced the challenge of restarting the server without a clear understanding of the underlying causes of the CRM failure and Always-On resolution. This server restart proved problematic as it halted all company systems until the server was fully operational, resulting in the so-feared, worst-scenario downtime with substantial financial losses.
The gravity of the situation prompted the IT Management to contemplate adding a dedicated server for the CRM, incurring considerable expenses such as thousands of dollars in SQL Server and additional licenses, setup, and maintenance costs.
The AimBetter Solution
Amid this dilemma, the company was introduced to AimBetter, providing a newfound confidence in its ability to identify the root cause of these recurrent disruptions and potentially offer a lasting solution. After a few days, the critical event occurred once more, marking its final occurrence!
As the CRM stopped working and the Always-On initiated its resolution process, AimBetter promptly displayed alerts on its dashboard. Notably, alongside the Database Not-Healthy alert indicating synchronization issues between the principal and secondary databases, a Table Corrupted alert emerged.
- Database Not Healthy Alert: AimBetter flagged a lack of synchronization between the principal and secondary databases. This indicated that the Always-On mechanism was struggling due to an underlying issue.
- Table Corrupted Alert: Simultaneously, AimBetter detected a corrupted table, shedding light on the primary source of the problem. A specific table in the principal server had data corruption, leading to the paralysis of all company activities during these events.
The Table corrupted event could be checked easily in the AimBetter Observer module, which displays all the events occurring in the selected period in a user-friendly dashboard.
With a click on the Table corrupted event, a log of this event’s occurrences was immediately displayed.
A further click on one of these logs opened the details needed to understand precisely which Table is the corrupted one.
This revelation uncovered the root cause of the severe and recurring issue—a specific table in the principal server had data corruption. This corruption was responsible for the prolonged paralysis of all company activities during these events.
AimBetter’s ability to pinpoint this specific problem, which is considered challenging even for the most skilled database administrators in environments with numerous databases and tables, provided a breakthrough. Instead of pursuing the costly route of adding a new server exclusively for the CRM, the company could now address the source of the problem directly, avoiding unnecessary expenses and ensuring the uninterrupted operation of their critical systems.
What took hours for the IT team to handle a critical event such as the CRM system outage, from the moment the users started to complain until the restart of the server and consequent complete downtime, wasting precious efforts without the possibility of really solving the root cause, with AimBetter the solution was found in a matter of minutes!