Everyone with a Teradata Database collects statistics. No way around it. Good query plans rely on it, and we’ve all gotten used to it.
But starting in Teradata Database 14.10 there’s some big changes. Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature. Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.
There’s a lot going on and a lot of choices involved. So what’s the easiest way to start getting familiar with AutoStats? And what’s required with the new DBQL logging options? If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.
Sources of Information
AutoStats leverages a variety of new statistics enhancements available in Teradata Database 14.10. Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.
Reference the orange book titled: Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information. Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.
However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats. For a solid understanding of this feature, read the orange book titled “Automated Statistics Management” by Louis Burger and Eric Scheie. It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.
Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”. It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10. This Viewpoint portlet is named “Stats Manager”.
This blog posting is intended to help you ease into using the new AutoStats feature. Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.
Version 6 of the Statistics Histogram
One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram. Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and for capturing insert, delete and update counts.
In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision. This change does not require a restart.
If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.
The database supports backward compatibility. Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions. After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics. The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.
New DBQL options - What They Do and When to Use Them
AutoStats functionality relies on your making more information available to the database about how statistics are used. Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.
But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required. While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place. AutoStats uses a “submit often” architecture, where statistics collections are submitted frequently, making the 14.10 threshold logic important for determining which statistics actually need to be refreshed at run time.
DBQL USECOUNT Logging
If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics as well as detect which statistics are being used. Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level. You turn USECOUNT on at the database level in a separate logging statement from the DBQL logging targeted to users.
BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;
Once you enable logging, USECOUNT tracks usage of all objects for that database. USECOUNT is the new mechanism for turning on Object Use Count (OUC).
One type of object tracking USECOUNT does has to do with statistics. In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage. With USECOUNT on, the database counts how often statistics objects were used by the optimizer. Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented. If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.
USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected. This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10. The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.
Here are few tips when logging USECOUNT:
-
Before you turn on USECOUNT, and in order for the optimizer to use the use counts, set the DBS Control field NoDot0Backdown to true.
-
Keep USECOUNT logging on for databases whose tables are being analyzed in an ongoing basis.
-
In general, it is good to keep USECOUNT logging on for all important databases. In particular, USECOUNT logging is important for those databases where the SYSTEM change-based THRESHOLD is in effect. New optimizer statistics fields in DBS Control allow the user to enable and set system-level THRESHOLD defaults. A future blog posting will discuss the new THRESHOLD functionality in more detail.
DBQL STATSUSAGE and XMLPLAN Logging
These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find. Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis. Both provide input to Analyze jobs whose evaluation method has been specified as “DBQL”. Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.
They basically do the same thing. However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.
The difference is that STATSUSAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps. The relationship to the query steps can only be made if XMLPLAN logging is also enabled. XMLPLAN logs detailed step data in XML format. If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl. STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps. So if you plan to log XMLPLAN, log STATSUSAGE at the same time.
If both types of logging are on, both options log into a single XML document for a given request. But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN. Logging for both are enabled at the user level, as is STEP or OBJECT logging.
When you begin logging with these new DBQL options, take into consideration any logging that is already taking place. In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.
BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;
The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging. See page 65-67.
Here are few tips when logging STATSUSAGE and XMLPLAN:
-
For first time tuning or analysis, or after changes to the database, enable both XMLPLAN and STATSUSAGE options for those users who access the affected tables.
-
Log with both options for at least one week in advance of running an Analyze job, or however long it takes to get good representation for the different queries.
-
Consider using the more detailed XMLPLAN only for focused tuning, or briefly for new applications, as it incurs some overhead and requires more space.
-
Turn off logging of both options during non-tuning periods of time or when no Analyze jobs on affected data are being planned in the near future.
-
Keep the scope of logging to just the users accessing the databases and applications undergoing analysis and tuning.
Note: You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site. Keep in mind the other DBQL logging rules already in place.
Becoming Familiar with AutoStats a Step at a Tme
Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for all of your databases, you want to keep careful track of how you scope your initial jobs. Pick your boundaries and start off small.
Initial steps to take when first starting out with AutoStats include:
-
Read the orange book Automated Statistics Management and understand how it works.
-
Select a small, non-critical database with just a few tables, perhaps in your sandbox area, for your initial AutoStats interactions.
-
Try to pick a database that represents a new application area that has no current statistics, or a database with some level of statistics, but whose statistics are not well tuned.
-
Set up DBQL logging options so that new logging options are focused on the database tables you have scoped.
-
Turn on DBQL USECOUNT logging just for that one database.
-
Turn on DBQL logging that includes STATSUSAGE AND XMLPLAN only for users who access that database.
-
Run the application’s queries against the database for several days or even a week.
-
Enable the Viewpoint Stats Manager data collector in Admin->Teradata Systems portlet. This enables the Stats Manager portlet.
-
To run Stats Manger collect and analyze jobs, the TDStats database in DBC must have permission to collect statistics on the objects in the job’s scope. For example:
GRANT STATISTICS ON <database> TO TDStats;
-
Select the database in the Statistics tab within the Viewpoint Stats Manager portlet and select the Automate command, in order to bring the existing statistics in that database under the control of AutoStats.
-
Create a new Analyze job
-
Pick the DBQL option and limit queries by log date to incorporate just the last week when you were focused on logging that database.
-
Select Require Review before applying recommendations.
-
Run the Analyze job and review all recommendations and approve them.
-
Create a new Collect job that incorporates your chosen database tables; run the Collect job.
-
Specify Automatically generate the collect list (the default).
- Preview the list of statistics to be collected before the collection job runs.
Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens. Over time, you can give control over statistics analysis and collection to AutoStats for additional databases.