Follow on twitter

Wednesday, 30 December 2015

Oracle performance tuning : Startup guide for beginners

Oracle database has very rich diagnostic features that empower an educated end user to diagnose issues and tune the system for best performance. In this post I am giving an broad overview of tools and techniques so that people can focus on what to read from the suggested material in previous post. I will also cover some of the techniques in detail in my posts however the objective of this post is broad overview

Lets look at the problem space

Product uses database, can we maximize the performance by tuning DB?
  • Can we increase the transaction throughput of OLTP type application
  • Can we improve the response time for queries on large data set in data warehousing type environment
  • Mixed batch type distributed solution inserts+ selects: can we maximize the throughput
  • etc

 Other problems
  • Trouble shooting database issue, where performance is fine in the replicated Dev environment but issue in customer environment. How do we compare the configuration and workload on two instances and pinpoint the root cause of issue?
  • Multiple products deployed on the same instance, in case of DB issue which one is causing it?
  • What are the SQLs that are taking most DB time?
  • Are we leveraging the available disks to maximize the available I/O? which tablespaces have most reads and writes and how are they distributed across disks?
  • Which DB sub-system is becoming the bottleneck, is it undersized buffers, the I/O etc
  • etc

 Some questions
    Is the application making optimal use of DB ?
  • Bind variables
  • Proper indexes
  • Tuned SQLs
  • Partitions (spatial locality)
  • Views 
  • Are we leveraging parallelism in DB
  • etc
    Is the database properly sized
  • Cursors
  • SGA
  • PGA
  • Temp space
  • Redo log size
  • etc
    Where is the DB time spent
  • SQLs
  • CPU
  • Waits : This is where there is lot of scope for improvement in performance
  •  etc
Hope the spectrum of performance problems that you can solve using Oracle performance tuning knowledge got you excited or at least interested!

Now lets discuss how you can start building these skills.

  • Get equipped with the knowledge of Oracle architecture.
    • Background Processes, memory structures, logical and physical structures
  • Learn SQL tuning techniques
    • Optimizer 
    • Explain plans
    • Parallel processing concepts
    • SQL tuning advisor
  • Understand AWR report
  • Understand Oracle wait events
    • There are too many of them, just focus on the few that you encounter and learn as you go
The reading material I had suggested in previous post covers most of it except how to read AWR reports. I will share my own experience in reading AWR reports in future posts.

AWR report contains very useful information to give focus to the optimization exercise.

  • You can know how busy the DB was in the period of interest.
  • How much concurrent activity is happening?
  • What is the breakdown of time spent by DB?
  • What your sessions were waiting for, and hence what is the bottleneck?
  • What were the top SQLs?
  • Is too much time being spent in parsing?
  • Which queries are causing most I/O activity?
  • How many log switches are happening?
  • What was the configuration of the DB?
  • Are your buffers optimally sized ?
  • etc etc etc etc!
So many "etc" because the report is so rich with information that even after reading many reports there is always a new learning like "Oh I have not used this section/ information in my analysis before".

AWR report helps in SQL tuning by identifying SQLs to tune. It helps is instance tuning by identifying we are waiting for which subsystem

ADDM can analyse the AWR report to provide recommendations for optimization.

Other stuff which you can venture into later is
  • SQL Trace
  • ASH

Saturday, 5 December 2015

Oracle Database : Good reading material from performance perspective

These are some of the resources that I found very helpful in my quest to tune Oracle performance. I will keep updating the list. My focus is on software performance hence on database performance so I wont be focusing on blogs and resources dedicated to Oracle database administration.


 Oracle performance tuning resources

Oracle blogs


  • Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions by Thomas Kyte
  • Oracle Core: Essential Internals for DBAs and Developers by Jonathan Lewis
  • Oracle Database 11g Release 2 Performance Tuning Tips & Techniques (Oracle Press) by Richard Niemiec
  • Forecasting Oracle Performance by Craig Shallahamer
  • Troubleshooting Oracle Performance Christian Antognini

My other posts on Oracle

Friday, 17 July 2015

The 2nd Annual Conference of CMG India

The 2nd Annual Conference of CMG India will be held in M.S.Ramaiah Institute of Technology in Bangalore, on Nov 27th and 28th 2015. CMG India is a community that networks performance engineers across IT companies.

Call for Papers & Tutorials
CMG India 2015 calls for original submissions of real life experiences, (research) work in progress, tutorials in the following areas:
  • Performance Engineering of IT Systems: Design, Development, and Production System Management
  • Performance & Capacity Modelling
  • Big Data Performance Engineering
  • Cloud Performance Engineering
  • Mobile Systems Performance
  • In Memory Computing: Design and Optimization
  • High Performance Computing
  • Large System Architecture & Design for Performance

For details visit their website

Saturday, 1 November 2014

What other product can learn from Oracle database performance management

Whenever I diagnose and tune oracle performance issues I am filled with a deep sense of respect for the engineers who designed and implemented the diagnostic capabilities of the Oracle database.

Levels of performance management can be
  • Descriptive
    •  What happened? ie Monitoring
  • Diagnostic
    • Why it happened?
  • Prescriptive
    • What to do?
Oracle does a excellent job at all the three levels

Oracle is a complex product with a more complex memory and process architecture that a general server side application. This makes the performance management more challenging. On top of it characterization of workload of a general server application is simple while in Oracle  DB even if you look at selects they can have so different resource and time foot print and can be of so many types. All this implies that the performance management of Oracle DB is very challenging. This makes the DB diagnostic solutions provided by Oracle even more admirable.

Oracle provides tuning parameters for each subsystem, be it checkpoint process, LWR process etc or the size of shared pool, Buffer cache, SGA, PGA etc. Oracle diagnostic data(AWR Reports) provides rich information to detect inefficiencies in any of the process or memory subsystems or hardware capacity. AWR report contains the top foreground waits which help in identifying which subsystem is becoming bottleneck and needs to be tuned or needs more capacity. The report identifies the high load SQLs that are candidates of tuning and optimization.

 Oracle goes beyond just diagnostics to prescriptions of solutions also. ADDM can be used analyse the AWR data and it will provide actionable recommendations to optimize the system. SQL Tuning Advisor can be used get the tuning recommendations for SQL query.

Oracle has achieved all the above three levels(Monitoring, Diagnostics, and Prescription ) where a capable customer can do it all by himself without needing support from Oracle, greatly reducing the mean time to response for performance issues as well as providing rich information for proactive performance management

Hats off to the architects and engineers of Oracle Diagnostics !

In future posts we will discuss how to do root cause analysis of Oracle issues using AWR reports

Other Posts

Thursday, 30 October 2014

CMG India 2014: 1st Annual Conference

Computer Measurement Group India is having its first annual conference CMG India 2014 in Performance Engineering and Capacity Management in Pune this December 2014

Dates : Fri Dec 12th 2014 (9am to 5:30pm + Dinner) & Sat Dec 13th 2014 (9am to 4:30pm)

Venue: The conference will be co-located at Persistent Systems and Infosys, Phase I, Rajeev Gandhi Infotech Park, Hinjewadi, Pune 411057. (The offices are opposite each other.)

The conference has three tracks

Other Posts

Sunday, 26 October 2014

Capacity Planning: Working within operational capacity

Final objective of all performance engineering, validation etc is to deliver performance that meets service requirements in production providing a smooth experience to the customer. One aspect is engineering of software  that has high capacity, low response time and optimally utilizes the resources(CPU, Memory, Storage and network), another aspect is managing the operations for optimal working of the software.  Even a well engineered software if used beyond its capacity will provide very poor user experience.  

Providing performance in the production environment is more important than just demonstrating it in performance labs.

Software utilizes hardware infrastructure to function and provide service. saturation in the infrastructure(easy to detect) implies saturation of software
In the world of distributed software, software modules/ components takes services of each other to provide service. Any one of the module that gets saturated will saturate the entire system or a major subsystem

Smooth operations implies all the subsystems work within their operational capacity.
Problem detection implies that monitoring is in place for the user experience so that problems are detected well before the service levels become completely unacceptable
Diagnostics implies that we can identify the subsystem(infrastructure + software service) that has saturated and take corrective actions
Predictive analytics implies that we can foresee the capacity issues before they arise.

General respons time vs  workload graph is below. The major characteristics for this curve is same for the hardware or software service.
If capacity and utilization of each subsystem is clear we have the decision support information to ensure smooth operations.

Operation of each subsystem should be within the operational capacity otherwise the performance will suffer.

Subsystem with the highest utilization is the one that will saturate first at higher workload and become the bottleneck.

Head room shown in the graph is the additional workload that the system can endure before it saturates. Depending on the risk tolerance additional capacity  can be added while operational head room is still remaining.

Sounds simple! what are the challenges?

In a distributed deployed solution there are too many subsystems.Lets say thousands of hardware equipment and similar number of software components
  • Are you monitoring all these subsystems?
  • what is the workload in these subsystems?
  • Do you know what is the capacity of these subsystems?
  • Do you know what is the utilization of these subsystems?
  • What is the head room in these subsystems? To endure more load

Another challenge is the workload itself. This depends on type of business. Inherently in business like eCommerce, Internet services there can be sudden surges in the end user activity. Some of these events can be anticipated like in festive season you can expect more online shopping or after a big marketing initiative with discounts there can be a big stress/workload on the eCommerce software services. So the systems that are working within operational capacity get saturated and provide poor end user experience at the critical business period.Typically govt websites for tax or online for submissions become unresponsive around the last dates. In trading systems lot of activity is driven by market volumes and critical events like corporate results, interest rate movements etc can trigger higher activity.

  • Do you have historical workload for your services and the operational baseline of workload
  • Work load trends (regular + seasonal)
  • Can you forecast the workload for important business event?
  • Is your capacity elastic ie if you know you need twice the capacity can you add it in time. More hardware + more software service + load distribution
Next challenge is do you understand the relationship between service utilization and the workload?
  • Do you have required  models?
  • Are you capturing the data required for these models?In all sub systems?
    • OS provides monitoring info about the hardware
    • What about the middleware?
    • What about the application software?
    • What about the DB?
  • Do you know the highest  load to which you can drive the system at which the response time meets the service requirements not in labs but in production.
We will discuss various models that solve these problems and data that you need to solve these issues in future posts.

Other Posts