Performance Testing

In today lesson we will talk about approach to performance testing.

Four principles of getting results from performance testing:

1. Test real applications

You should test the actual application that you are going to deploy to production not a prototype or a mock-up.

Microbenchmarks

Microbenchmarks are small, focused tests designed to measure the performance of specific code snippets or algorithms.

For example, you might want to measure

the time to execute one algorithm versus another
the overhead in creating a thead versus using a thread pool

The features of Java like just-in-time compilation and garbage collection make it difficult to write microbenchmarks correctly.

Why is it difficult to write microbenchmarks correctly in JVM?

First few executions: Bytecode interpreted by the JVM; interpretation involves the jvm executing the bytecode line by line, translating it into machine code and executing it directly.
Repeated executions: JIT compiler detects frequently executed code segments and dynamically compiles them into highly optimized machine code during runtime and then compiled code replaces the interpreted code for subsequent executions. As a result, code becomes faster the longer it is executed.
For this reason, all benchmarks (not just microbenchmarks) typically include a warm-up period during which the JVM is allowed to compile the code into its optimal state.
Be cautious when designing threaded microbenchmarks to avoid measuring JVM contention rather than the intended performance goals (instance variable must be declared as volatile).
Smart compiler can figure out redundant operations and discrard some operations. This can lead to incorrect results. It better to use precomputed values as input for microbenchmarks.
Microbenchmarks must measure the correct input to avoid misleading results, taking into account potential exceptions and considering the real-world usage scenarios.

Macrobenchmarks

Macrobenchmarks means testing the application in its complete configuration, including external resources to get more accurate performance results.
Performance affecting factors like resource allocation, network saturation, code optimization, and CPU performance are more likely to be detected in macrobenchmarks.
Complete benchmark testing helps to identify bottlenecks and areas for performance improvement, as optimizing only one part may not yield immediate benefits.

By default, one JVM during a GC cycle CPU usage will go up to 100%. When JVM is running concurrently with other applications, it will not be able to get 100% of the machine’s CPU during GC. Performance testing should be done on a dedicated machine otherwise result differ from production.

Mesobenchmarks

Mesobenchmarks are tests that occupy a middle ground between a microbenchmark and a full application.
For example, a mesobenchmark could be a test that measures how quickly a server can respond to a simple REST call without authentication or authorization.
Isolating performance at a modular or operational level—via a mesobenchmark—offers a reasonable approach but is no substitute for testing the full application.

2. Understand throughput, batching, and response time

Elapse Time (Batch Time)

How long it takes to complete a specific task
Batch-oriented tests (or any test without a warm-up period) have been infrequently used in Java performance testing but can yield valuable results.

Throughput (RPS/TPS/OPS)

The number of requests/transactions/operations that the application can handle per second
Clients have no think time between requests

A server that can sustain 500 OPS with a 0.5-second response time is performing better than a server that reports a 0.3-second response time but only 400 OPS.

Response time

The amount of time that elapses between the sending of a request from a client and the receipt of the response
Clients have think time between requests mimics user behavior, throupghput remains constant

3. Understand variability

To understand how test results vary over time. Programs that process exactly the same set of data will produce a different answer each time they are run.
Testing code for changes is called regression testing. In a regression test, the original code is known as the baseline, and the new code is called the specimen.

4. Test early and often

You should test the application as early often as possible.
Performance testing should be part of the continuous integration process.
Automate everything: All performance testing should be scripted (or programmed, though scripting is usually easier)
Measure everything: CPU usage, disk usage, network usage, memory usage, and so on

If the CPU usage has increased, it’s time to consult the profile information to see what is taking more time. If the time spent in GC has increased, it’s time to consult the heap profiles to see what is consuming more memory. If CPU time and GC time have decreased, contention somewhere has likely slowed performance: stack data can point to particular synchronization bottlenecks, JFR recordings can be used to find application latencies, or database logs can point to something that has increased database contention.

Run on the target system: testing should be done by the expected load on the expected hardware in production.

A test that is run on a single-core laptop will behave differently than a test run on a machine with 72 cores. That should be clear in terms of threading effects: the larger machine is going to run more threads at the same time, reducing contention among application threads for access to the CPU. At the same time, the large system will show synchronization bottlenecks that would be unnoticed on the small laptop.

References

Java Performance Tuning (2nd Edition)