In Part 1 of this series, I provided a brief overview of what statistics are and what they do. In this post, I'll discuss gathering statistics.
Gathering statistics often seems hard. Well, to be more precise, it's not so much the gathering but the choice of the parameters/options to use to do so. So, sticking with my "keeping them simple" mantra for this series, I'll cover the default collection parameter settings and why it's a good idea (in my opinion) to use them whenever possible. I'm not saying that you should "always" use defaults. But the defaults should be your starting point, and deviating from them requires a verified reason and an explanation for why - stored where those that follow can find it.
DBMS_STATS PackageDBMS_STATS is the Oracle Supplied Package that is utilized for managing statistics collections. You may have been around since the days when the ANALYZE command was used, or may actually still have this old command in use in some of your legacy code. But, if you want to gather statistics specifically geared to work best with the Oracle Optimizer, DBMS_STATS is the choice.
Automatic CollectionsOracle recommends using their automated stats gathering mechanism to gather and maintain stats on all database objects. These auto-collections and part of the automatic maintenance infrastructure called AutoTask. Maintenance windows are pre-defined for each day of the week, usually in the early morning hours. Within these windows, Oracle will make collections on objects that have stale or missing statistics. The automatic job calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC procedure which prioritizes collections on objects determined to have the most need to be updated so that they are done first. This prioritization is important since the time window for automatic collections is limited (by default) so Oracle wants to make sure to collect for the most "stale" objects first so as to finish them before the time window ends. If the end time arrives and there are still objects left to be collected, they will not be done until the next window opens the following day.
The automatic collection job is enabled as follows:begin
dbms_auto_task_admin.enable( client_name => 'auto optimizer stats collection', operation => null, window_name => null) ;
The automatic collection job is disabled as follows:begin
dbms_auto_task_admin.disable( client_name => 'auto optimizer stats collection', operation => null, window_name => null) ;
If the job is enabled, Oracle will collect statistics on objects that have no stats or where the number of changes to the table is 10% of the number of rows it has. The STATISTICS_LEVEL parameter must also be set to TYPICAL or ALL in order for Oracle to be able to determine if objects are stale or not. Collections will use all default parameters by default and I'll cover those in just a bit.
Oracle recommends using the automated collections and turns on the job in the default maintenance wnidow when the database is installed. I think the concept is great but find that the variability of having statistics change on their own without my specific knowledge or intervention is a bit frightening. Why? Well, since statistics are the key bit of data the optimizer uses to determine execution plans, when stats changes, plans change. Now, it's true that we want statistics to be representative of our data so having them change as they grow old should be good. But, due to the sensitive nature of production environments (and the desire to have things stay stabile), many people prefer to set aside their own window of time for collecting stats and do it routinely themselves. For example, perhaps the AutoTask collection is disabled and replaced by a manually created job to run once a week on Saturday evening only.
The more your tables change by inserting/deleting/truncating lots of rows during a short period of time or the more bulk loading of data you do, the more likely it is that you may want to use manual gathering steps.
What level to collectYou gather stats at whichever level best suits your needs: database, schema or object (table or index). There are also subprograms that gather stats on dictionary and fixed objects, but I'll cover those later in the series. I rarely collect at the database level as I prefer to be a bit more specific and collect only the specific schemas or objects used in my production applications. There may be other non-prod use schemas out there for testing, monitoring, repositories, etc, and I collect stats as/when needed on those, but I prefer not to spend time and resources in my prod environment collecting stats on "extraneous" objects as a routine matter.
The procedures used are GATHER_DATABASE_STATS, GATHER_SCHEMA_STATS, GATHER_INDEX_STATS, and GATHER_TABLE_STATS.
When to collectAs I mentioned above, by default, Oracle's automated stats gathering job will spend time each day collecting stats for you if the AutoTask job is enabled. Even if you decide to utilize the AutoTask job you may also want to establish a routine manual collection where you gather stats on everything at the same time. With the AutoTask job, if objects don't change by much (10%), the statistics won't be considered stale enough to be automatically collected. But, you may find that you get better plans and therefore better performance by keeping all objects updated regularly whether they have reached the default staleness percentage or not.
You can create your own staleness report to learn how your objects change over time. Using ALL_TAB_MODIFICATIONS, you can determine (just like Oracle will do for you) when objects are reaching certain percentage of change thresholds. The following query can be used to check staleness of your objects:
select mods.table_name, mods.num_rows, mods.blocks, mods.last_analyzed, mods.tot_rows, mods.chngs, mods.pct_c from ( SELECT dbta.table_name, dbta.num_rows, dbta.blocks, dbta.last_analyzed, NVL (dbta.num_rows, 0) + NVL (dtm.inserts, 0) - NVL (dtm.deletes, 0) as tot_rows, NVL (dtm.inserts, 0) + NVL (dtm.deletes, 0) + NVL (dtm.updates, 0) as chngs, (NVL (dtm.inserts, 0) + NVL (dtm.deletes, 0) + NVL (dtm.updates, 0)) / GREATEST (NVL (dbta.num_rows, 0), 1) as pct_c, dtm.truncated trn FROM all_tables dbta LEFT OUTER JOIN all_tab_modifications dtm ON dbta.owner = dtm.table_owner AND dbta.table_name = dtm.table_name WHERE dbta.owner = UPPER('<enter the schema owner to check here>') --AND dbta.table_name IN (<use filters here to limit what tables are included>) ) mods where mods.pct_c > <staleness percent here> -- change the percentage for "staleness" limit or mods.trn = 'YES' -- or (mods.num_rows > 0 and mods.num_rows <= 100000) -- set the number of rows considered to be "small" order by mods.num_rows ;
You can then use the output over time to determine how quickly objects become stale and then find the "sweet spot" for when to schedule your routine manual collections.
How much to collectThis is the question I hear most frequently: What estimate_percent value should be used for statistics collection? There are several really great references out there to back me up, but I'll simply say the first/best answer is to use the default. The default value is AUTO_SAMPLE_SIZE. This constant allows Oracle to automatically determine the best percentage of the data to sample in order to produce statistically correct cardinality and selectivity estimates for your data. Using AUTO_SAMPLE_SIZE also allows Oracle to utilizes its approximate NDV (number of distinct values) feature. Rather than go into all the details here, I'll point you to a great blog entry at the Oracle Optimizer team's blog that explains why using AUTO_SAMPLE_SIZE is such a good idea.
Generally speaking, use the default unless you can quantatively verify the need to use a specific value (like 100%). If your tables aren't too large and you could actually afford the time and database resources to commit to a 100% collection on some objects that you may determine need more aggressive stats collections, then set up preferences for just those objects and set their estimate_percent parameter to the value you need. Otherwise, your best bet is to stick with the default.
There is so much more to cover, but I'm going to end this post here and will pick back up in the next part of the series and discuss the collection parameters in more detail. As we go along during this series, my hope is to build a foundation step by step and then dive into some of the less common statistics topic areas later. See you next time!
I have question to raise concerning this sentence - specifically the word ALL:
Oracle recommends using their automated stats gathering mechanism to gather and maintain stats on all database objects.
I don't think that includes the data dictionary nor fixed tables. In fact the Oracle docs suggest that the DBA is responsible for collecting those stats - i.e. that the auto job does not cover them - see this old blog (may be out of date):
The Oracle docs in referenced say the auto job does not cover these. So question #1 - has this changed? Then question #2 - how would you recommend automatically stats collection for the data dictionary and fixed tables?
You may wonder why I am asking this. We have found that users on large databases (where large means lots of database objects - not data size) that tools like Toad and SQL Developer which makes lots of queries to the data dictionary in order to populate their user friendly GUI's can run slow without such stats gathered - and that the auto job does not seem to handle this auto magically for them. And most DBA's would say that the auto job does - and hence they argue vehemently rather than just running two quick commands to fix their slow performance :(
So I'm keenly interested in your opinions here - thanks in advance ...
Your question(s) are a great seque to a future post in the series. I buried my comment about addressing dictionary and fixed object stats down a bit further in the article when I said: "There are also subprograms that gather stats on dictionary and fixed objects, but I'll cover those later in the series.".
So, keep reading as the series progresses as I should hit that topic in the next post or two!