Follow on twitter

Sunday, 25 May 2014

Analytical thinking in performance analysis

Many performance job requirements list analytical skills as one of the requirements. For people who love analysis this is a interesting job.

Lot of analysis is done using mental models or mathematical models for the scenario. Without analysis the test results are just numbers.

Major limitation of performance testing is that you cannot test all possible scenarios and use-cases. Testing and modeling complement each other. One cannot expect major re-engineering decision to be taken based on performance flaws pointed out by analytical reasoning only, they have to be backed by actual measurements.

Good modeling requires a solid understanding of how things work. When you keep looking at data from various perspectives and keep asking questions interesting insights emerge.

Below are some of the scenarios where modeling and analysis can complement testing and measurements.

Modeling and pre-analysis helps in design of right tests

For example from the understanding of architecture, when you expect synchronization issues ,you will design the test so that saturation even while CPU resource is available is clearly established

Modeling and analysis can even provide bounds on  expected results

For example, In case where all concurrent transactions serialize on a shared data structure, we know that time spent in synchronized block is serial. So if 50 microseconds are being spent inside synchronized block then there can be at most 20000 transactions per second beyond which the system will saturate.This limit is independent of number of threads executing the transactions in the server and number of available cores. If synchronization is the primary bottleneck by reducing the time spent in synchronized block to half we can double the throughput.

Modeling and analysis help in identifying the bottleneck

Simple model above has given a upper limit on throughput which can be validated by testing. if the solution saturates before that we know that primary bottleneck is somewhere else. Quite often people justify the observed results just on hunches like “Oh! solution saturated using 50% of CPU,it must be a synchronization issue” and accept the results.Without a right quantitative approach this can hide the actual bottleneck. For example in the above example if the saturation is much before 20000 transactions per seconds asking the poor developer to optimize the synchronization will not really solve the problem as the bottleneck is somewhere else

Modeling and analysis can help in answering the what if scenarios.

For example you have tested using 10 G network and someone asks what are your projections if the customer runs on a 1 G network? What are the capacity projections for the solution in this scenario?
What do you think we need higher per core capacity or larger number of cores for the deployment of the solution? if we double the available RAM what kind of improvements can we expect?
If we double the number of available cores what is the expected capacity of the system?

Modeling and analysis help in evaluation of hypothesis

For example someone thinks a particular performance problem is due to OS scheduling, this can be analytically evaluated. For example we had a scenario in which 600 threads were scheduled on 8 core server. When there were performance issues developer felt problem should be from thread contending to get scheduled. When we created the model of how CPU was being used it was very obvious the issue was not of scheduling.By accepting a false hypothesis the real cause of issues remain hidden. 

Modeling and analysis enhances our understanding of the system

When models dont match the measurements there are new learnings as we revisit our assumptions

Modeling and analysis helps in extrapolation of results

We might want to extrapolate the results due to various reasons , there can be practical limitations like you dont have enough hardware to simulate the required number of users.

We will discuss how we can model various performance scenarios in future posts.

Related Post :

Software bottlenecks:- Understanding performance of software systems with examples from road traffic

Thursday, 8 May 2014

Time scale of system latencies

In my previous post we were discussing why providing low latency is more difficult than providing scalability. Below is a scale showing latency cost of various operations.

1 CPU Cycle
0.3 ns
1 s
Level 1 cache access
0.9 ns
3 s
Level 2 cache access
2.8 ns
9 s
Level 3 Cache access
12.9 ns
43 s
Main memory access (DRAM, from CPU)
120 ns
6  min
Solid-state disk I/O (flash memory)
50-150 µs
2-6 days
Rotational disk I/O
1-10 ms
1-12 months
Internet: San Francisco to New York
40 ms
4 years
Internet: San Francisco to UK
81 ms
8 years
Internet: San Francisco to Australia
183 ms
19 years
TCP packet retransmit
1-3 s
105-317 years

Fraction of 1 sec

Data is from the book Systems Performance: Enterprise and the Cloud by Brendan Gregg
First table gives the cost of operations and the second table explains the units of time. The third column of first table converts the cost to time scale that we understand, time scale of seconds, minutes, days, months and years. Here the cost of operations is scaled assuming if the cost of CPU cycle is 1 second hypothetically than what will be the cost of other operations.

How to use knowledge of above latencies?
  •  It can help in understanding the lower bound of latency or response time for a certain scenario
    • For example if the user is in Australia and server is in San Francisco than time between request and response will be greater than 2 network hops or greater than 366 ms (183 * 2 )
  •   Performance objectives cant be met using a certain technology/operation
    • For example if you want response in microseconds than the critical path of code execution cannot have a disk read/write. 
    • If your algorithm has more than 10 random memory operation to do some work it will at least take more than 1.2 microseconds (120ns * 10).
  • Overall latencies can be estimated by knowing the cost of basic operations 
  • Finding reasons for performance jitters
    • For example if response time jumped  from microseconds to milliseconds. What could be the possible reasons? We know disk reads can take milliseconds so page fault can be a possible reason. GC in Java could be a reason as GC jitters are from few to 100s of milliseconds, but we should not suspect networking at 1 GBPS LAN as it will not cause delays in milliseconds especially if utilization is low. Knowing the system latencies help you in narrowing your search for the culprit.

Related previous posts:

Sunday, 13 April 2014

Why providing low latency is more challenging than providing scalability

Response time and capacity are the two fundamental indicators of software performance.In industry like finance low latency has a lot of value for the traders. High frequency trading which drives large percentage of trades in many markets requires very fast trading software that can react to market depth in microseconds or nanoseconds. Read about low latency challenges in finance industry here.

Performance challenges in the web technologies is more on scalability side.In trading software where we are looking at milliseconds, microseconds in the web world even 250 milliseconds is good enough as response time for end users. Even in the web world people want low service time at server side as high service time indirectly means low capacity but they wont care much if 50-100 milliseconds are added by Internet.

Why providing low latency is more challenging ?
Problem of scalability was solved with load balancing and horizontal scaling. Although there are technical challenges but a load of million of users can be distributed on thousands of serves and with proper design of software, load balancing and capacity planning.

Latency is the sum of time spent in all activities required to do some work.The time may be spent on CPU or in waits. Basic service time is the latency in which the request does not have to wait in the queue before being serviced. At very high capacity utilization the response time may be 10-100 times the basic service time if the utilization is high due to queuing. So the first important step is to have good capacity planning to manage the response time.If you get this wrong than even achieving a very low basic service time latency will be practically wasted.

Lets say you are able to achieve it and still want lower latency what do you do? Now that you know you have right load balancing and appropriate capacity utilization so that  response time is around basic service time mostly except for bursts.We now focus on reducing the basic service time.

Two basic things that can reduce the basic service time are
  • Better algorithms
  • Parallelism
here again not many problems have high parallelism. If you want even lower latency what do you do? To understand this we need to discuss what are the sources of latency and how industry is addressing these sources. I will discuss them in my posts in future. At a very basic level you need reduce the number of steps to complete the task or find a way to reduce the cost of steps. There are fundamental limits to reduction in number of steps or reduction of cost of a single step. Even if we look at hardware for example RAM, the increase in size and bandwidth over time has been significant but reduction in latencies is much less comparatively.

Comparing performance optimization and road travel
if you compare performance optimization to the road travel. you can achieve higher capacity by adding more lanes to the road. Here basic service time is the time to travel when road is completely empty. Your latency or time to travel increases if there is lot of traffic on the road which is like high capacity utilization.Now if you want to reduce the time from the basic service time or time to travel on empty road, what can you do. Either you can find a shorter path or you need faster driving speed. There are limits to both of them. you cannot reduce the distance between to points beyond the shortest distance. You can not keep increasing the speed beyond a point. Similarly when reducing latency beyond a point it becomes more and more difficult. IT industry is hacking into OS, libraries,network, switches hardware to reduce the sources of latencies but reducing it further is becoming more and more difficult and expensive. There are market gateways that can provide market update from wire to app memory in 1 microseconds. a single memory fetch from RAM can take 40 to 100 nano seconds so you can imagine you can not achieve this latency even if you do 10 memory fetches.

Wednesday, 13 November 2013

Software bottlenecks:- Understanding performance of software systems with examples from road traffic

Through simple examples from road traffic I want to share some of the concepts of software system performance

Modern Financial systems are like maze of interconnected distributed software systems with huge amount of data flowing between these nodes.Different messages taking different routes taking services of different applications in their path.What is common between the performance of these systems and traveling on roads?


Quite often while driving slowly through traffic jams the first question that comes to my mind is where is the bottleneck. While its much easier to detect the bottleneck in road traffic it is not so obvious in the complex software system.

What is bottleneck?
Wikipedia defines it as "A bottleneck is a phenomenon where the performance or capacity of an entire system is limited by a single or limited number of components or resources."

While driving from Noida to Delhi the slowest part of my journey is a 3-4 KM stretch of a highway. After crossing one particular point of the highway the traffic becomes very fast. At this point people and vehicle cross the road and there is no traffic light.Slow moving traffic queues are formed in places much before this point. Some days when traffic police prohibits crossing at this point there is smooth traffic on the entire highway.This intersection is the bottleneck for the entire traffic flow on this highway


Software bottlenecks are also like this. There will be messages/pending requests in the queues of the component or components before the bottleneck. After the bottleneck component you will observe very less queued requests and smooth operations. This is also the case with traffic bottlenecks just after you cross the bottleneck the traffic is quite smooth and fast.

Software performance bottlenecks will be as evident as traffic bottlenecks if we could easily see where queues are forming in software systems. For this we need to instrument the application and collect good statistics to monitor the state of application. For a set of interconnected components in a workflow it is quite easy to detect which component is bottleneck if you know the pending requests with each. Detecting bottlenecks in large monolithic, multithreaded, multilayered systems is more challenging. Although the basic principles apply but the queues here are more subtle as they are internal to application.

Bottlenecks start becoming evident only after a certain load on the system .On this highway if I go very early morning the traffic is quite smooth. Its only in the peak traffic hour that it becomes really painful. If your load is smaller than the capacity of bottleneck than you will get smooth flow. High load performance testing is done to identify these bottlenecks.

When one bottleneck is resolved it shifts to the next slowest part but the overall capacity of the system is larger. In my example of the road traffic when the bottleneck at the interaction is removed we can see queues at the next traffic light but overall traffic flow is smother than before.As the overall capacity of the system is larger you will hit the next bottleneck at a higher load.

The next post will be about latency and throughput.Why providing low latency is more difficult than providing higher throughput.What are the various components of the latency. We will discuss these issues with simple examples from road traveling.