By Juan Carlos Olamendy Turruellas
In this series of articles, I’ll talk about important wait statistics for watching in order to avoid system degradation, that is, getting on slow response time.
Let’s start by saying that the response time (total time to finish the work) of a request sent to a server process depends on two factors:
We can mathematically express the former concept using the formula: response_time = service_time + wait_time.
We’ve talked in previous articles about techniques/designs/configurations in order to improve and monitor the service time; so in this article, we’ll focus on the wait time and related events.
It’s remarkable to say the wait time and related events are only the symptoms of the problems, so they show what’s slowing down the performance, but they don’t tell why. Thus, it’s responsibility of the DBA to investigate the real cause by doing a causal analysis.
There are dynamic views that show all wait events related to an instance. These wait event views show similar information but from different point of view of the instance and they’re very important for resolving performance issues. There are more than 950 wait events since Oracle database 11g, although the most important ones are related to resource contention such as latch, lock, buffer and I/O contention. Wait events are grouped together by wait classes as shown below:
The first metric to analyze is the instance performance, that’s, to calculate the ratio of total time dedicated to service time (working) vs wait time. This information can be extracted from the V$SYSMETRIC view as shown below in the listing 01.
SELECT METRIC_NAME, VALUE
WHERE METRIC_NAME IN ('Database CPU Time Ratio', 'Database Wait Time Ratio') AND INTSIZE_CSEC =(SELECT max(INTSIZE_CSEC) FROM V$SYSMETRIC);
From the output of the former SQL query, we can validate whether or not the wait time is higher than the working time (CPU time). From this point, we go in further details to discover the real causes related to the wait events. Wait classes provide a view to discover where the database instance is performing poorly as shown below in the listing 02 using the V$SYSTEM_WAIT_CLASS view.
SELECT WAIT_CLASS, TOTAL_WAITS, round(100*(TOTAL_WAITS/SUM_WAITS),2) PCT_TOTAL_WAITS,
ROUND((TIME_WAITED/100),2) TIME_WAITED, round(100*(TIME_WAITED/SUM_TIME_WAITED),2) PCT_TIME_WAITED
FROM (SELECT WAIT_CLASS, TOTAL_WAITS, TIME_WAITED FROM V$SYSTEM_WAIT_CLASS WHERE WAIT_CLASS != 'Idle'),
(SELECT sum(TOTAL_WAITS) SUM_WAITS, sum(TIME_WAITED) SUM_TIME_WAITED FROM V$SYSTEM_WAIT_CLASS WHERE WAIT_CLASS != 'Idle')
ORDER BY PCT_TIME_WAITED DESC
In our case, that Other, Configuration, User I/O and System I/O are responsible for most of the wait time by watching the PCT_TIME_WAITED column in the report. We can see that although User I/O and System I/O are responsible for the most wait events by watching at the PCT_TOTAL_WAITS column, they don’t represent the most wait time by watching the PCT_TIME_WAITED column. We can conclude that I/O operations are not the bottleneck in the system in this moment as shown in the listing 03.
The V$SYSTEM_EVENT view shows the total time waited for all the events in the instance since it started up. We can use this view to discover what are the top wait events. We can calculate the top n wait events by dividing the event’s wait time by the total wait time for all events. For getting the 10 wait events, we need to execute the SQL query as shown in the listing 04.
SELECT EVENT, TIME_WAITED, round(100*(TIME_WAITED/SUM_TIME_WAITED),2) PCT_TIME_WAITED,
TOTAL_WAITS, round(100*(TOTAL_WAITS/SUM_TOTAL_WAITS),2) PCT_TOTAL_WAITS
FROM (SELECT EVENT, TIME_WAITED, TOTAL_WAITS FROM V$SYSTEM_EVENT),
(SELECT sum(TIME_WAITED) SUM_TIME_WAITED, sum(TOTAL_WAITS) SUM_TOTAL_WAITS FROM V$SYSTEM_EVENT)
We can see that there is no significant I/O wait events. In this case, the most significant wait event is “rdbms ipc message” which means that background processes are waiting for IPC messages (indicating a wait for more work) and according to the vendor documentation it’s not necessary to worry because it’s like a sort of idle event. The report is shown in the listing 05.
We can filter the events like “db file%” as shown below in the listing 06 in order to validate if there are (or not) heavy I/O operations over the database files.
FROM (SELECT EVENT, TIME_WAITED, TOTAL_WAITS FROM V$SYSTEM_EVENT WHERE EVENT like 'db file%'),
We can see that although there are some waits associated to I/O operations over database files, they don’t impact on the percent of waited time that is almost close to 0 as shown in the report in the listing 07.
Let me tell that in Oracle we see two types of data block access:
We need to monitor these metrics under a heavy I/O workload because they may constitute the bottleneck of your system. If we find out that there is a lot of wait time related to these metrics (it’s not the case for our example, fortunately), then we need to follow the strategies:
We can see in the listing 07 that we’re doing a lot of “db file sequential read” (a 12.22% of waits), although not impacting fortunately on the waited time for now.
We can check the current wait events for a given session using the SQL query as shown below in the listing 08.
SELECT event, state, wait_time, seconds_in_wait
We can also use the V$SESSION_WAIT_HISTORY view to display information about the last ten wait events filtering by a given session as shown below in the listing 09.
SELECT seq#, event, wait_time
ORDER BY seq#
In this first article, I’ve started talking about monitoring wait statistics in Oracle databases. Now you can apply these knowledge and techniques to your own Oracle database instances in order to monitor the performance.
Excellent article, thanks.