As data volumes and user bases grow, manual testing analysis becomes increasingly labor-intensive and inefficient. Today, artificial intelligence (AI) steps in to assist engineers. This article explores an AI-based solution designed by PFLB to automatically detect performance anomalies and generate reports based on load testing results.
Key Takeaways
The PFLB team has developed an AI module for analyzing outcomes of load testing. This solution goes beyond just generating results, offering valuable insights by efficiently analyzing performance metrics.
The PFLB AI feature enables:
- Performancу Anomaly Detection: Identifies performance issues in IT systems during load tests.
- Real-time Alerts: Sends notifications for every transaction in real-time.
- Technical Reports: Generates technical reports based on a comprehensive system performance analysis throughout the test.
Want to try this solution for your IT system? Contact us for a free demo.
Why AI and ML are Essential in Load Testing
Before the 2010s, the scope of load testing tasks was completely manageable manually. However, the 2020s’ rapid growth in the number of IT systems and the data they generate made automation absolutely essential. Now every QA team aims to delegate as many tasks as possible from humans to machines.
Simply generating an AI testing report is no longer enough. It must as well add value by collecting and interpreting statistics, classifying outcomes, and providing recommendations.
Looking ahead, in the foreseeable future AI will not only analyze load testing results but also assist in test planning and execution.
Load Resting Tasks AI Can Address
- Scripts and Data. AI assists in managing data pools and automating interactions with scripts.
- Testing. Automation of pipelines and conducting tests.
- Documentation. Automated results analysis, report generation, and visualization.
Popular AI-Based Solutions
Several AI solutions currently exist for load testing:
- Blaze Monitor Test DataPro. A platform generates AI-driven data for load testing scripts.
- Load Ninja and Load Runner. Platforms that use browser-level scripts for performance testing. These tools simulate user interactions and employ AI for image recognition in script formation.
❌ Limitations: these tools are suitable for small-scale systems, but as traffic increases, they demand significant computing power.
- APM Tools: Dynatrace, Neuralink, and AppDynamics use AI for performance issues detection and trend prediction in IT systems.
❌ Limitations: AI-generated results can be opaque due to the architecture of neural networks. Also, large data volumes are required to train these systems effectively.
The market research proved that existing solutions are yet to cover all the load testing needs. Moreover, they have their flaws — which only means more AI software is still needed.
PFLB AI Feature Goals
- Accerelate decision-making process: a business owner can decide whether the product is ready for launch or needs refining.
- Prevent time wasting: minimize downtime in long-running tests by providing timely notifications.
- Minimize risks by enabling real-time monitoring and standardization of testing processes.
- Save resources by using predictive analytics to automate large-scale tests and identify system bottlenecks early.
Risks
- AI conclusions may be non-transparent. The challenge lies not only in getting results from AI but also in understanding how they were made.
- False Positives/Negatives. AI can occasionally produce hallucinations or incorrect results, leading to misinterpretations.
PFLB Solution: Development Process and Capabilities
The feature developed by PFLB team automatically generates technical reports based on the results of various types of performance testing.
The following are used as core metrics:
- Response Time;
- Threads/Virtual Users;
- Requests per Second;
- Errors.
Our goal was to create a system with conclusions understandable and verifiable by a load testing engineer. Thus, our AI feature would stand out from others currently available on the market.
Stage 1. Statistical Model for Extreme Deviations in Response Time
The first stage involved creating a statistical model designed to detect extreme deviations in the Response Time metric.
In a normal situation, the increase in response time is described by a mathematical model — in this case, a Gaussian distribution. The model contains several parameters that allow its sensitivity to be adjusted depending on the specific test.
Testing the model helped identify a basic set of parameters around which adjustments can be made, either to reduce or increase the number of detected performance issues. This allows the model to be made more or less sensitive depending on the data it processes.
Stage 2: Correlation Between Response Time and Threads/Virtual Users
The second model works with the Response Time and Threads/Virtual Users metrics in order to find extreme correlations.
The challenge lies in the fact that Response Time is measured in seconds, while Threads is measured in relative units, i.e. virtual users. In essence, comparing these two is like similar to comparing the number of crocodiles with their color (red or green). The team solved this problem using statistical physics methods.
As with the first model, acceptable ranges were established for the key parameters. This allows the model to identify both normal and abnormal behavior during analysis.
PFLB AI Feature Use Cases
1. Extreme Response Time Deviations
During testing, the system automatically identifies periods of sharp increases in Response Time, which may indicate system failures under increasing load.
_Top: General view of transactions.
On the right: Results for a 15-minute interval._
Note the numerous warnings related to discrepancies between Response Time and the number of users. These points are marked with diamonds on the top graph. In this case, the load increases gradually without significant steps, leading to local fluctuations in Response Time values. It is important to highlight that these are not performance anomalies yet, but rather warnings. Users might find such alerts helpful, but the warnings can also be disabled if preferred.
Blue squares on the right graph indicate extreme deviations in response times, which are often of the most significant interest.
At the input, there are three time series: Response Time, Threads, and RPS (Requests per Second). By analyzing the combined behavior of these three metrics, the AI concludes that since the overall behavior of Threads does not qualitatively differ from Response Time and RPS, the system has not yet reached its scalability limits, so further load increase is possible. This indicates normal, expected system behavior, where adequate performance is anticipated.
2. Mismatch Between Request Volume and Load.
In this case, the model captures all dynamic changes. When we reduce the sensitivity, the graph on the right shows the results. Now, the model highlights only critical performance issues, while those that can be ignored are not mentioned.
By analyzing this example, the AI classifies the system as non-scalable. Why? There are two main reasons:
The RPS dynamics do not align with the behavior of Threads (varying differently in 56 out of 480 instances). The number of requests per second fluctuates throughout the test. Although there is an overall growth trend, in about 10–15% of cases, we observe a decline in the curve.
Response Time fluctuates throughout the test, even with low Thread counts. This is particularly obvious on the right chart, where a period in the middle of the test shows oscillations in response time despite increasing load.
The mismatch between RPS and Threads, as well as the inconsistency between Response Time and Threads, leads the AI to conclude that the system is non-scalable.
3. Stepwise Load Testing
In previous examples, the load increases gradually. However, in this case, the model identifies that starting from a certain load level (specifically, when the number of users exceeds 90), Threads (green curve) continue to increase, but RPS (pink curve) does not.
Additionally, when examining the dynamics of Response Time (blue curve), the model detects that Response Time begins to increase more rapidly than Threads. This might not be easily noticeable, but the model highlights it. Based on this, the AI concludes that the system has reached its scalability limit at a load level of 90 users.
Conclusion
With PFLB’s AI feature, you can detect significant deviations in core system performance metrics, receive real-time notifications for each transaction, and generate comprehensive technical reports enriched with AI-driven insights on system behavior.
The solution has proven effective in performance bottlenecks detection and speeding up the reporting process for load tests. Future modifications will include enhanced functionality for analyzing not only performance bottlenecks but also initial test statistics, enabling automatic creation of load profiles and testing scenarios.
The team is also exploring AI’s potential to identify root causes of cascading performance issues, whether due to hardware issues or architectural flaws, enabling more precise diagnostics and system performance improvements.
Interested in a demo? Contact us for a trial version of the PFLB product.
Top comments (0)