A hospital operating in a SAP ECC (formerly SAP S/4) environment experienced performance issues after migrating their BI system to a new AWS VM:
- Slow query performance.
- Long-running queries performing table scans and excessive disk page readings.
- Queries completed without generating output.
Their setup was based on Oracle Database, with two servers configured as Oracle Real Application Clusters (RAC). The replica for BI purposes was based on SSIS and MSSQL on-premise.
The issue persisted despite adding indexes as a potential solution for the table scans. A comprehensive analysis was initiated to isolate the source of the problem.
AimBetter Investigation
Through AimBetter, it was observed that queries in the BI replica were performing table scans and high disk page readings. Based on this, the team hypothesized that indexing could resolve the issue. However, implementing indexes yielded no improvement.
2- SSIS and Network Communication
The on-premise SQL Server’s SQL Server Integration Services (SSIS), which consumed data from the Oracle replica, was functioning without errors. Additionally, there were two communication channels between the Oracle VM on AWS and the on-premise SQL Server. Both channels were tested and confirmed to be working correctly.
3- Oracle Logs and Listener Analysis
The turning point came when the team added a listener to monitor the Oracle V$ views. Analysis of the logs revealed that Oracle queries were being executed but were not returning output. This led the team to investigate external factors that might interfere with query execution.
Root Cause
A detailed inspection of the network configuration identified that a Palo Alto firewall was the culprit. A specific firewall configuration was silently stopping the Oracle queries from completing properly. This behavior had not been evident in earlier troubleshooting efforts.
Solution
The firewall configuration was adjusted to allow the Oracle queries to execute and return output without interference. Following this change:
Query performance normalized.
Data began flowing correctly between the Oracle replica and the BI tools.
Long-running queries completed as expected with correct output.
This case highlights the importance of considering network and external system configurations when addressing database performance issues. Despite initial assumptions about database indexing and query optimization, the root cause was an external firewall setting. The successful resolution underscores the value of comprehensive analysis, including log monitoring and network inspections, in troubleshooting complex systems.
Lessons Learned
Holistic Troubleshooting: Performance issues may stem from external systems, not just the database or application itself.
Log Monitoring: Adding listeners and analyzing database logs provided critical insights.
Collaboration Across Teams: Coordinating with network administrators helped identify and resolve the firewall configuration issue.
Proactive Network Configuration Reviews: Regular reviews of firewall and network policies can preempt similar issues in the future.
By systematically isolating and addressing the issue, the company restored optimal performance to their BI system, ensuring continuity in decision-making processes.
If you want to learn more about how Aimbetter can enhance your IT management by preventing disruptions and speeding up recovery from unexpected events, please get in touch with us today!