Quantcast
Channel: Teradata Developer Exchange - All blogs
Viewing all 136 articles
Browse latest View live

Entering and Exiting a TASM State

$
0
0

In earlier postings I’ve described how TASM system events can detect such things as AMP worker task shortages, and automatically react to change workload managements settings.   These system events tell TASM to look out for and gracefully react to resource shortages without your direct intervention, by doing things like adjusting throttle limits temporarily downwards for the less critical work.

This switchover happens as a result of TASM moving you from one Health Condition to another, and as a result, from one state to another state.  But how does this shift to a new actually state happen?  And under what conditions you will you be moved back to the previous state?

Timers that control the movement between states

Before you use system events to move you between states, you will need to familiarize yourself with a few timing parameters within TASM, including the following:

  • System-wide Event Interval - The time between asynchronous checks for event occurrences.  It can be set to 5, 10, 30 or 60 seconds, with 60 seconds being the default
  • Event-specific Qualification Time – To ensure the condition is persistent, this is the amount of time the event conditions must be sustained in order to trigger the event
  • Heath Condition Minimum Duration – Prevents a constant flip-flop between states,  the minimum amount of time that a health condition (which points to the new state) must be maintained even when the conditions that triggered the change are no longer present.

Entering a State

Let’s assume you have defined an AWT Available event that triggers when you have only two AMP worker tasks available on any five of your 400 AMPs, with a Qualification Time of 180 seconds.   Assume that you have defined the Health Condition associated with the state to have a Minimum Duration of 10 minutes, representing the time that must pass before the system can move back to the original state. 

TASM takes samples of database metrics in real-time, looking to see if any event thresholds have been met.  It performs this sampling at the frequency of the event interval. 

Once a sampling interval discovers the system is at the minimum AMP worker task level defined in the event, a timer is started.   No state change takes place yet.  The timer continues on as long as each subsequent sample meets the event’s thresholds.  If a new event sample shows that the event thresholds are no longer being met, then the timer will start all over again with the next sample that meets the event’s threshold criteria.

Only when the timer reaches the Qualification Time (180 seconds) will the event be triggered, assuming that all samples along the way have met the event’s threshold.  At that point TASM moves to the new state. 

Exiting a State

Returning to the original state follows a somewhat similar pattern.

The Minimum Duration DOES NOT determine how long you will remain in the new state, but rather it establishes the minimum time that TASM is required to keep you in the new state before reverting back to the original state.  

So when will you exit a state?

Event interval sampling continues at the event interval number of seconds all the while you are in the new state.  Even if the event threshold criteria is no longer being met and available AWTs are detected to be above the available threshold, once the move to the new state has taken place, the new state remains in control for the Minimum Duration.

After the Minimum Duration number of seconds has been passed, if event sampling continues to show that the AWT thresholds are being met (you still have at least five AMPs with only two AWTs available), TASM will continue to stay in that new state.  Only after the first sample that fails to meet the event thresholds is experienced (once the Minimum Duration number of seconds has passed) will control be moved back to the original state.

The bottom line is that you will not return to the original state until the Minimum Duration time of the state's Health Condition has been passed, but you may not be returned then if the condition that triggered the event persists.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Using the Apache Derby ij tool with the Teradata JDBC Driver

$
0
0

Oracle (Sun) JDK 6.0 and JDK 7.0 include the Apache Derby database, which is implemented in 100% Java. 

This article is not about the Apache Derby database; instead, this is about a command-line interactive SQL tool that is included with Apache Derby. 

The Apache Derby "ij" interactive SQL tool is quite useful. It is a command-line SQL tool similar to BTEQ, and it works with any JDBC Driver and any database, not just the Apache Derby database. 

The following instructions show how to use the ij tool with the Teradata JDBC Driver. 

First, you need to have a JDK installed (not just a JRE) that includes the Apache Derby jar files. The Apache Derby jar files are typically located in the db/lib directory under the main JDK install directory. (Note that some builds of JDK 6.0 have problems with installing the Apache Derby files.) 

Second, you need to have the Teradata JDBC Driver jar files terajdbc4.jar and tdgssconfig.jar available. 

On Windows

Assuming that your JDK 6.0 or 7.0 is installed in directory c:\jdk and that your Teradata JDBC Driver jar files are located in c:\terajdbc 

c:\jdk\bin\java -cp "c:/terajdbc/terajdbc4.jar;c:/terajdbc/tdgssconfig.jar;c:/jdk/db/lib/derbytools.jar" org.apache.derby.tools.ij

On UNIX or Linux

Assuming that your JDK 6.0 or 7.0 is installed in directory /usr/jdk and that your Teradata JDBC Driver jar files are located in /usr/terajdbc

/usr/jdk/bin/java -cp "/usr/terajdbc/terajdbc4.jar:/usr/terajdbc/tdgssconfig.jar:/usr/jdk/db/lib/derbytools.jar" org.apache.derby.tools.ij

 

After ij starts, it prints a command prompt "ij>". The interactive commands are the same regardless of which platform you are running on.

Commands in ij must be terminated with a semicolon, just like with BTEQ. You can obtain interactive help with the "help;" command.

ij version 10.2
ij> driver 'com.teradata.jdbc.TeraDriver';
ij> connect 'jdbc:teradata://mydbhost/TMODE=ANSI' user 'joe' password 'please';
ij> select current_timestamp;
Current TimeStamp(6)
--------------------------------
2008-01-16 16:03:46.4

1 row selected
ij> disconnect;
ij> exit;

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Calculation of Table Hash Values to Compare Table Content

$
0
0
Short teaser: 
To be able to compare table content between different systems a table hash function is needed.
Cover Image: 
AttachmentSize
2289_Arndt.pdf959.88 KB

At the Partners 2012 I presented the session “How to compare tables between TD systems in a dual active environment” - see attached slides for details if of interest for you.

The main emphasis was and is to draw attention to an issue which becomes more important as more and more customers are using multiple Teradata instances. Either as dual active or by using different appliances for different purposes (e.g. online archive plus EDW instance). If you expect to have the same data on two or more systems, the question can be asked on how to PROOF that the data is really the same – joining is not possible.

Current approaches like calculating column measures have major limitations especially for big tables – see the presentation for details. Unfortunately classical hash algorithm like SHA1 or MD5 – which had been designed especially for that purpose - can not be used within the DB as native functions on table level as they require the same ordering of data which can not be guaranteed at two different systems. An option is to extract the whole data sorted and run the SHA1 on the output file. But this leads to major costs for the sorts and network traffic.

A new way to overcome these limitations is a new table hash function which overcomes the ordering requirement of the SHA1 and MD5 hash functions.
The main idea is based on a two step approach:
1Calculate for each row a hash value where the outcome has the classical hash properties.
2“Aggregate” the row hash values by an aggregate function where the result does not depend on the ordering of the input (commutative and associative).

The resulting output can be used as a table hash to compare data content between different systems.

The main considerations to choose the right component function are:
for 1. avoid of hash collision, which require reasonable long hash values for the row hash. This is for example an issue if you wane use the internal hashrow function of Teradata as hash collisions occur even for small tables.
for 2. use a good function which is also handling multiset tables (where XOR) has an issue.
We implemented a C UDF and compared the results with export of data options and see already with this implementation competitive resource consumption figures.

During the last months we have spent more R&D on this and have found a better hash function for the row hash calculation. This reduces the CPU consumption to about 60% in comparison to the SHA1 hash function we used before. In addition we found a better aggregation function which overcomes limitations of the ADD MOD 2^Hash length function.

 

As discussed already in the Partner 2012 presentation the best performance for this kind of functions should be achieved in case the function would be implemented as a core DB function similar to the hashrow function. This would also improve usability as the described data preparation is not needed. But it is likely that Teradata will only implement this with a strong demand from the customer side.

In the mean time – and here starts the sales pitch – if you need to proof that data on two different systems are the same or you wane discuss any of this in more detail then contact me.

There are also additional use cases for this function which you should also take into consideration e.g. regression testing of DB versions, systems and your own software.
So in summary - be aware of the issues and consider the alternatives!

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

The Revenge of Brick and Mortar

$
0
0
Cover Image: 

Darryl McDonald, President of Teradata Applications, tweeted this link today:  a practical example of when many of the subjects discussed on this blog come together. 

In a sentence, Wal*Mart (I’m still addicted to the old name) is using in-store real-time geofencing to push offers to customers.  And, as Apple has known for years, the use of Wal*Mart eco-system alone has the virtuous result of more sales and increased customer stickiness.  "This new breed of mobile-empowered customers is good news for us," Thomas said. "Compared to nonapp users, customers with a Wal-Mart app make two more shopping trips a month to our stores and spend nearly 40 percent more each month."

Pair this with Wal*Mart’s huge store of information about each customer’s shopping habits and desires, coupled with like customer’s data, and they can anticipate needs that the customer didn’t realize that they had.  We’ve come a long way from stacking the beer next to the diapers.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

TASM State Changes are Streamlined in Teradata 14.0

$
0
0

A state matrix is a construct that allows you to intersect your business processing windows with the health conditions of your system.  Why should you care about this abstraction?  The state matrix in TASM is the mechanism that supports an automated change in workload management setup as you move through your processing day, or should your system become degraded.  Not only is it automatic, but it's a significantly more efficient way to make a change in setup while work is running.

First, consider the two categories that are intersected in our state matrix:

  1. “Planned environments” represent different times of day when business priorities are expected to be different, such as night and day; weekday and weekend; regular, month-end or year-end. In the graphic below you can see that planned environments are on the horizontal axis of the state matrix.
  2. “Health conditions” represent the robustness of the system. When a node is down you might want throttle limits to be set differently, or priority setup to shift. Health conditions make up the vertical axis of the matrix. There may be two, possibly three health conditions that you want to treat differently in terms of workload management.

The intersection of these two categories, both which can necessitate setup changes, is referred to as a state. Although a simple state matrix is supplied by default, you will need to define your own specific planned environments and health conditions if you wish to make use of automated changes to workload management.  Once you define a state in Viewpoint Workload Designer, you can associate it to one or several different intersections, as shown in the figure above.

 

As mentioned above, a health condition or a planned environment can change as time passes or as the health of the system suddenly degrades. They can also change as result of a system event that the DBA sets up using TASM.   For example, you can define an event that will be triggered when Available AWTs reach a specified low level on some number of AMPs.  When you define a system event you give it an “action”, and one action could be to switch to a planned environment that throttles back low priority work. 

The figure below shows the workload attributes that can be modified when a state changes.

 

In the initial releases of TASM, users on busy systems sometimes experienced a delay in waiting for a state change to complete.  When going through a state change in releases prior to Teradata 14.0, a non-trivial level of internal work had to be performed:  All of the internal TASM tables that define the ruleset had to be re-read, all of the TASM caches had to be rebuilt, the delay queues were completely flushed, and all of the running queries on the system had to rechecked for adherence to throttles and filters.  Finally, all throttle counters had to be reset.

This overhead has been almost completely eliminated in Teradata 14.0.  With the state change optimization feature, there is minimal impact when doing a state change.  Internal tables do not need to be re-read and the delay queue is left intact.  There is no longer a need to recheck every running query.   A simple update to the existing cache is made to reflect the state information, and the new priority scheduler configuration is downloaded.  State-transition delay queue re-evaluation has been measured to be negligible overhead. So making frequent state changes is easily supportable, should you need to provide for that. 

Even if you are not using the state matrix and are not automating the predictable changes in your processing day, you can still throttle back low priority work on the fly.  However, doing so would require manually enabling a new rule set.  This is not a good idea.  When you change a rule set, interaction with Workload Designer is required to download and activate a new rule set, and far more re-evaluations are required to existing requests, delay queues, Priority Scheduler mappings, etc.

Changing workload management behaviors by enabling an entirely new rule set is not able to take advantage of the state change optimizations in Teradata 14.0. The delay caused by enabling a new rule set on a very busy system has in extreme cases been measured in in the minutes vs. the negligible overhead of state transitions.  Remember that an Activate of a new ruleset requires the reading of the ruleset from the TDWM database and activity which may contend with an already stressed system, whereas a state change does not require the TDWM database to be accessed.

Bottom line:  Get comfortable using the state matrix to design and automate your planned and unplanned changes, and enjoy a more efficient transition from setup to setup.  And for those of you already using the state matrix, you will have a smoother experience in the face of change once you are on Teradata 14.0.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Recommended Dictionary Statistics to Collect in Teradata 14.0

$
0
0

Collecting statistics on data dictionary tables is an excellent way to tune long-running queries that access multi-table dictionary views.  Third party tools often access the data dictionary several times, primarily using the X views.  SAS, for example, accesses DBC views including IndicesX and TablesX for metadata discovery.  Without statistics, the optimizer may do a poor job in building plans for these complex views, some of which are composed of over 200 lines of code.

In an earlier blog posting I discussed the value of collecting statistics against data dictionary tables, and provided some suggestions about how you can use DBQL to determine which tables and which columns to include.  Go back and review that posting.  This posting is a more comprehensive list of DBC statistics that is updated to include those recommended with JDBC.

Note that the syntax I am using is the new create-index-like syntax available in Teradata 14.0.  If you are on a release prior to 14.0 you will need to rewrite the following statements so they are in the traditional collect statistics SQL.

Here are the recommendations for DBC statistics collection.  Please add a comment if I have overlooked any other useful ones.

COLLECT STATISTICS
 COLUMN TvmId
 , COLUMN UserId
 , COLUMN DatabaseId
 , COLUMN FieldId
 , COLUMN AccessRight
 , COLUMN GrantorID
 , COLUMN CreateUID
 , COLUMN (UserId ,DatabaseId)
 , COLUMN (TVMId ,DatabaseId)
 , COLUMN (TVMId ,UserId)
 , COLUMN (DatabaseId,AccessRight)
 , COLUMN (TVMId,AccessRight)
 , COLUMN (FieldId,AccessRight)
 , COLUMN (AccessRight,CreateUID)
 , COLUMN (AccessRight,GrantorID)
 , COLUMN (TVMId ,DatabaseId,UserId)
ON DBC.AccessRights;


COLLECT STATISTICS
 COLUMN DatabaseId
 , COLUMN DatabaseName
 , COLUMN DatabaseNameI
 , COLUMN OwnerName
 ,  COLUMN LastAlterUID
 , COLUMN JournalId
 , COLUMN (DatabaseName,LastAlterUID)
ON DBC.Dbase;


COLLECT STATISTICS
 COLUMN LogicalHostId
 , INDEX ( HostName )
ON DBC.Hosts;

COLLECT STATISTICS
 COLUMN OWNERID
 , COLUMN OWNEEID
 , COLUMN (OWNEEID ,OWNERID)
ON DBC.Owners;

COLLECT STATISTICS
 COLUMN ROLEID
 , COLUMN ROLENAMEI
ON DBC.Roles;


COLLECT STATISTICS
INDEX (GranteeId)
ON DBC.RoleGrants;


COLLECT STATISTICS 
COLUMN (TableId)
, COLUMN (FieldId)
, COLUMN (FieldName)
, COLUMN (FieldType)
, COLUMN (DatabaseId)
, COLUMN (CreateUID)
, COLUMN (LastAlterUID)
, COLUMN (UDTName)
, COLUMN (TableId, FieldName)
ON DBC.TVFields;


COLLECT STATISTICS
 COLUMN TVMID
 , COLUMN TVMNAME
 , COLUMN TVMNameI
 , COLUMN DATABASEID
 , COLUMN TABLEKIND
 , COLUMN CREATEUID
 , COLUMN CreatorName
 , COLUMN LASTALTERUID
 , COLUMN CommitOpt
 , COLUMN (DatabaseId, TVMName)
 , COLUMN (DATABASEID ,TVMNAMEI)
ON DBC.TVM;

 
COLLECT STATISTICS
 INDEX (TableId) 
 , COLUMN (FieldId)
 , COLUMN (IndexNumber)
 , COLUMN (IndexType)
 , COLUMN (UniqueFlag)
 , COLUMN (CreateUID)
 , COLUMN (LastAlterUID)
 , COLUMN (TableId, DatabaseId)
 , COLUMN (TableId, FieldId)
 , COLUMN (UniqueFlag, FieldId)
 , COLUMN (UniqueFlag, CreateUID)
 , COLUMN (UniqueFlag, LastAlterUID)
 , COLUMN (TableId, IndexNumber, DatabaseId)
ON DBC.Indexes;


COLLECT STATISTICS
 COLUMN (IndexNumber)
 , COLUMN (StatsType)
ON DBC.StatsTbl;


COLLECT STATISTICS
 COLUMN (ObjectId)    
 , COLUMN (FieldId)
 , COLUMN (IndexNumber)
 , COLUMN (DatabaseId, ObjectId, IndexNumber)
ON DBC.ObjectUsage;


COLLECT STATISTICS
 INDEX (FunctionID )
 , COLUMN DatabaseId
 , COLUMN ( DatabaseId ,FunctionName )
ON DBC.UDFInfo;


COLLECT STATISTICS
 COLUMN (TypeName)
 , COLUMN (TypeKind)
ON DBC.UDTInfo;

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

How to spell a numeric value in english words

$
0
0

In a recent topic in the forums there was a question on "how to spell out numeric values in english" and an excerpt from the manuals indicating there is a format for the new TO_CHAR function in TD14: "Any numeric element followed by SP is spelled in English words."

The manuals are a bit misleading, they don't mention that this option is only available for TO_CHAR(DateTime). It's based on Oracle's implementation and indeed embarrassing as year or julian day  are also numeric values, but you can't pass a numeric value, it must be extracted from a DATE.

 

Teradata's date range is less than Oracle's, so the well-known "julian day" trick TO_CHAR(TO_DATE(numericval,'j'), 'jsp') doesn't work as Oracle's calendar starts on January 1, 4712 BCE, thus the lowest date in Teradata is julian day 1721426 in Oracle:

 BTEQ -- Enter your SQL request or BTEQ command:
SELECT TO_CHAR(DATE '0001-01-01', 'jsp') (VARCHAR(100));

 *** Query completed. One row found. One column returned.
 *** Total elapsed time was 1 second.

TO_CHAR(0001-01-01,'jsp')
---------------------------------------------------------------------
one million seven hundred twenty-one thousand four hundred twenty-six

But based on this knowledge you can apply some modifications to get up to 6 digits:

SEL 123456 AS x,
   CASE
      WHEN x = 0 THEN 'zero'
      ELSE SUBSTRING((TO_CHAR(DATE '0763-09-18' + ABS(x), 'jsp') (VARCHAR(100))) FROM 13)
   END;


 *** Query completed. One row found. 2 columns returned.
 *** Total elapsed time was 1 second.

          x <CASE  expression>
----------- ---------------------------------------------------------------
     123456 one hundred twenty-three thousand four hundred fifty-six

Fiinally add some more calculations to cover the full range of a BIGINT and implement it as a SQL UDF:

REPLACE FUNCTION SpellNumeric (x BIGINT)
RETURNS VARCHAR(220)
LANGUAGE SQL
CONTAINS SQL
NOT DETERMINISTIC
RETURNS NULL ON NULL INPUT
SQL SECURITY DEFINER
COLLATION INVOKER
INLINE TYPE 1
RETURN 
     CASE WHEN ABS(x) >= (1e+15 (BIGINT)) AND (ABS(x) / (1e+15 (BIGINT)) MOD 1000) <> 0 THEN SUBSTRING(TO_CHAR(DATE '0763-09-18' + ABS(x / (1e+15 (BIGINT)) MOD 1000), 'jsp') FROM 13) ||' quadrillion ' ELSE '' END
  || CASE WHEN ABS(x) >= (1e+12 (BIGINT)) AND (ABS(x) / (1e+12 (BIGINT)) MOD 1000) <> 0 THEN SUBSTRING(TO_CHAR(DATE '0763-09-18' + ABS(x / (1e+12 (BIGINT)) MOD 1000), 'jsp') FROM 13) ||' trillion '    ELSE '' END
  || CASE WHEN ABS(x) >= (1e+09 (BIGINT)) AND (ABS(x) / (1e+09 (BIGINT)) MOD 1000) <> 0 THEN SUBSTRING(TO_CHAR(DATE '0763-09-18' + ABS(x / (1e+09 (BIGINT)) MOD 1000), 'jsp') FROM 13) ||' billion '     ELSE '' END
  || CASE WHEN ABS(x) >= (1e+06 (BIGINT)) AND (ABS(x) / (1e+06 (BIGINT)) MOD 1000) <> 0 THEN SUBSTRING(TO_CHAR(DATE '0763-09-18' + ABS(x / (1e+06 (BIGINT)) MOD 1000), 'jsp') FROM 13) ||' million '     ELSE '' END
  || CASE WHEN x = 0 THEN 'zero' ELSE SUBSTRING((TO_CHAR(DATE '0763-09-18' + ABS(x) MOD (1000000 (BIGINT)), 'jsp') (VARCHAR(100))) FROM 13) END
;

SELECT 123456789012345678 (BIGINT) AS x, spellnumeric(x);


 *** Query completed. One row found. 2 columns returned.
 *** Total elapsed time was 1 second.

                   x spellnumeric(x)
-------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  123456789012345678 one hundred twenty-three quadrillion four hundred fifty-six trillion seven hundred eighty-nine billion twelve million three hundred forty-five thousand six hundred seventy-eight 

Some remarks:

  • Sadly, there's no easy way to change the output to another language.
  • This function doesn't cover negative values (simply add a CASE WHEN x < 0 THEN 'minus ' ELSE '' END)
  • It's based on american english, but could be easily modified to the european numbering scheme (billion = milliard, etc.).  
  • And of course it's not tested for the full range of values smiley
Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Big Data: Close to Home

$
0
0
Cover Image: 

Most of the “Big Data” examples deal with enterprise analytics being run on terabytes or petabytes of data.  However, this example shows how widespread social data analytics have become.  And, as I live in the wine country, this example is quite close to home.

One of the local papers had a nice write-up on how the small local wineries are tracking their customer’s like and dislikes when it comes to the winery’s product offerings.  They are using a product offering from VinTank, which brings enterprise level social analytics to small wineries.  “’It really enables me to monitor and engage anybody that's talking about our brand,’ said Dylan Elliott, e-commerce coordinator for Crimson Wine Group, which owns Healdsburg's Seghesio Family Vineyards and uses VinTank. ‘You can really dig into the customer database and see who your advocates are, all kinds of information.’” Crimson Wine Group has revenues in 2012 of less than $50M.

Of course, especially lately, privacy concerns are starting to become a concern.  In one example of the analytics’ capabilities: “’If anyone checks in at your winery or takes a picture or tweets, we push that right in front of your winery,’ said Mabray, the VinTank strategy officer. ‘We say, 'This person who just checked in at Foursquare is on your property, and is a wine club member. Go say 'Hi' to them right away.' . . . It's a little creepy sometimes.’”  I personally don’t see this as a problem of mission creep:  if this person is using Foursquare, then I would think that they would want to be acknowledged.

The key here is that all retail organizations, no matter what their size, are now going to find that social analytics are a necessity, not an option, if they want to compete on any reasonable scale. The more retail front line analytics progress, the more we are getting back to the 19th century corner grocer who knew your name, knew your wants and tastes, and was waiting for you to walk through the door.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Extended Object Names in the .NET Data Provider

$
0
0

In Teradata Database Release 14.10 object names have been lengthened from 30 bytes of a Latin or KANJISJIS character set to 128 Unicode characters. This capability supports a richer set of characters to compose object names. While using a session character set that may contain limitations due to the supported set of characters, Unicode delimited identifiers are supported in all SQL text statements. Please see the SQL Data Types and Literals manual for information on Unicode delimited identifiers.

The choice of the session character set imposes restrictions on the support of object names. The .NET Data Provider Release 14.10 has incorporated two changes to work around those limitations and still contains some restrictions based upon the session character set and the Teradata metadata support of Unicode delimited identifiers :

  • Executing Commands in a Unicode Session
  • Generating and Supporting Unicode Delimited Identifiers
  • Restrictions in the Data Provider
  1. Executing Commands in a Unicode Session 

    • Connecting to Teradata

      The connection process now is executed in a Unicode session character set, enabling connection to any user or database supported by the Teradata Database. The .NET Data provider supports logon information as Unicode strings. During the connection process, the Data Provider submits the TdConnection.ChangeDatabase command also in a Unicode session, when the Database logon property is supplied.

      The TdConnection.ChangeDatabase command may also be executed independently of the logon process and supports the database name as a Unicode string.

      As an example, the code below is executed in a LATIN1250 session character set, which cannot represent Unicode code points greater than U+00FF. By supplying the user name, password and database strings in Unicode, the Data Provider supports connection to the Teradata Database using an expanded set of Unicode characters.

      TdConnectionStringBuilder blr = new TdConnectionStringBuilder();
      DataSource = "teradb01";
      // The user id contains a Unicode escape sequence representing a CJK character
      blr.UserId = "User\u3029";
      blr.Password = "MyPassword";
      blr.SessionCharacterSet = "LATIN1250_1A0";
      TdConnection con = new TdConnection(blr.ToString());
      con.Open();

       

    • Changing an Expired Password

      Expired passwords are also supported as a Unicode string, when providing the TdConnectionStringBuilder.NewPassword property as a Unicode string. As the connection is established, and an expired password is detected, the supplied NewPassword Unicode string is used to modify the password. When a password is expired, the only command permitted to the database is a modify user command to modify the password.

  2. Generating and Supporting Unicode Delimited Identifiers

    • Stored Procedure Command Execution Generating Unicode Delimited Identifiers

      Stored Procedure execution accepts Unicode strings while using TdCommand.CommandType.StoredProcedure. The command expects the name of the stored procedure in the TdCommand.CommandText property. The Data Provider will issue the command by constructing a Unicode delimited identifier for the procedure name when the stored procedure name cannot be represented within the current session character set. See the Teradata Database manual Data Types and Literals for more information on Unicode delimited identifiers.

      The following example illustrates calling a stored procedure with a stored procedure name containing a character that is not compatible in the session character set. The session character set is ASCII.

      // The stored procedure definition contains a U+3021 Unicode character and
      // a column name with U+30A1 Unicode character
      //
      // DDL: REPLACE PROCEDURE U&amp;"sptest#3021" UEscape'#'
      // (in col2 INTEGER, out U&amp;"col2#30a1" UEscape'#' INTEGER)
      // begin
      //&nbsp;&nbsp;&nbsp; set U&amp;"col2#30a1"UEscape'#' = col2;
      // end;
      //
      TdConnectionStringBuilder blr = new TdConnectionStringBuilder();
      blr.DataSource = "teradb01";
      // The user id contains a Unicode escape sequence representing a CJK character
      blr.UserId = "User\u3029";
      blr.Password = "MyPassword";
      blr.SessionCharacterSet = "ASCII";
      TdConnection con = new TdConnection(blr.ToString());
      con.Open();
      TdCommand cmd = new TdCommand();
      cmd.Connection = con;
      cmd.CommandType = CommandType.StoredProcedure;
      // The stored procedure name contains a Unicode escape sequence representing a CJK character
      cmd.CommandText = "sptest\u3021";
      TdParameter param1 = new TdParameter("param1", TdType.Integer);
      param1.Direction = ParameterDirection.Input;
      param1.Value = 10;
      cmd.Parameters.Add(param1);
      TdParameter param2 = new TdParameter("param2", TdType.Integer);
      param1.Direction = ParameterDirection.Output;
      cmd.Parameters.Add(param1);
      using (TdDataReader dr = cmd.ExecuteReader())
      {
            // process the data from the output parameter
            Int32 result = (Int32)cmd.Parameters[1].Value;
      }
      con.Close();
    • Stored Procedure Execution Supporting Unicode Delimited Identifiers

      As mentioned in our Data Provider Developers Reference (TdCommand.CommandText property), if a dot (.) is included within the stored procedure name, then the user must construct the TdCommand.CommandText surrounding the text with double quotes when required or composing the object name with Unicode delimited identifiers when required. The dot character is ambiguous and is sometimes used to separate the database name and the stored procedure name. The following example illustrates calling a stored procedure with a database name and a stored procedure name. The session character set is ASCII. 

      // The stored procedure definition contains a U+3021 Unicode character and
      // a column name with U+30A1 Unicode character
      //
      // DDL: REPLACE PROECDURE U&amp;"sptest#3021" UEscape'#'
      // (in col2 INTEGER, out U&amp;"col2#30a1" UEscape'#' INTEGER)
      // begin
      //    set U&amp;"col2#30a1"UEscape'#' = col2;
      // end;
      //
      TdConnectionStringBuilder blr = new TdConnectionStringBuilder();
      blr.DataSource = "teradb01";
      // The user id contains a Unicode escape sequence representing a CJK character
      blr.UserId = "User\u3029";
      blr.Password = "MyPassword";
      blr.SessionCharacterSet = "ASCII";
      TdConnection con = new TdConnection(blr.ToString());
      con.Open();
      TdCommand cmd = new TdCommand();
      cmd.Connection = con;
      cmd.CommandType = CommandType.StoredProcedure;
      // Compose the stored procedure name as a Unicode delimited identifier
      cmd.CommandText = "tdnetdp.U&amp;\"sptest#3021\"UEscape'#';
      TdParameter param1 = new TdParameter("param1", TdType.Integer);
      param1.Direction = ParameterDirection.Input;
      param1.Value = 10;
      cmd.Parameters.Add(param1);
      TdParameter param2 = new TdParameter("param2", TdType.Integer);
      param1.Direction = ParameterDirection.Output;
      cmd.Parameters.Add(param1);
      using (TdDataReader dr = cmd.ExecuteReader())
      {
            // process the data from the output parameter
            Int32 result = (Int32)cmd.Parameters[1].Value;
      }
      con.Close();
    • TdCommand SQL Execution Supporting Unicode Delimited Identifiers

      Commands that are executed as command text have the option of representing object names that are not representable in the current session character set as Unicode delimited identifiers. Unicode delimited identifiers are fully supported in SQL text. However the metdata from Teradata will contain translation error characters when object names are not representable within the current session character set. This example selects data from a table name and column name that contain characters not representable in the ASCII session character set.

       

      TdConnectionStringBuilder blr = new TdConnectionStringBuilder();
      blr.DataSource = "teradb01";
      // The user id contains a Unicode escape sequence representing a CJK character
      blr.UserId = "User\u3029";
      blr.Password = "MyPassword";
      blr.SessionCharacterSet = "ASCII";
      TdConnection con = new TdConnection(blr.ToString());
      con.Open();
      // Represent the table name as a Unicode delimited identifier
      String tblName = "U&amp;\"customers#60B1\"UEscape'#'";
      String col1 = "custid";
      // Represent the column name as a Unicode delimited identifier
      String col2 = "U&amp;\"custname#60B3#60B4#60B5\"UEscape'#'";
                  
      String commandSelect = String.Format(CultureInfo.InvariantCulture,
            @" Select {0}, {1} from {2} ", col1, col2, tblName);
      
      String commandCreateTable = String.Format(CultureInfo.InvariantCulture,
            @"create set table {0} ({1} integer, {2} varchar(30))", tblName, col1, col2);
      
      String commandInsert =
            String.Format(CultureInfo.InvariantCulture, @"insert into {0} ( 1, 'john');", tblName) +
            String.Format(CultureInfo.InvariantCulture, @"insert into {0} ( 2, 'johnny');", tblName);
      
      TdCommand queryCmd = new TdCommand(commandCreateTable, con);
      queryCmd.ExecuteNonQuery();
      queryCmd.CommandText = commandInsert;
      queryCmd.ExecuteNonQuery();
      queryCmd.CommandText = commandSelect;
      using (TdDataReader dr = cmd.ExecuteReader())
      {
            // number of rows returned
            Int32 result = (Int32)dr.RecordsReturned;
      }
      con.Close();
  3. Restrictions in the Data Provider

    • TdCommandBuilder Command Execution

      Executing commands within TdCommandBuilder are recommended to use characters only representable within the current session character set or using a Unicode session character set, due to the limitation on extended object name representation within Teradata answer sets. Any characters not representable in the current session character set will be returned by the Teradata Database as translation error characters.

    • Schema Collections and Visual Studio Wizard Applications

       Visual Studio wizards and Visual Studio Server Explorer retrieve object names from Teradata by submitting requests that contain restrictions. The object name representation of these restrictions are limited by the current session character selected by the user. Any characters that are not representable in the current session character set, will be returned by the Teradata Database as translation error characters. To fully support Visual Studio application wizards and Server Explorer, it is recommended to use a Unicode session character set.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Why don't you use ALTER TABLE to alter a table?

$
0
0

To add or drop a column or modify the list of compressed values of an existing table is a quite expensive operation. For a large table it might result in a huge amount of CPU and IO usage and a loooooooong runtime. This blog discusses the pros and cons of the different ways to do it.

Alter Table vs. Insert Select vs. Merge Into

As always in SQL one got multiple choices to reach the same goal: modify a table directly or move the data to a new table. The former is ALTER TABLE (Alter), the latter INSERT SELECT (InsSel) or its less well-known variation MERGE INTO (Merge).

Let's start with a list of pros and cons (red = negative, green = positive):

  ALTER TABLE INSERT SELECT MERGE INTO
Needs Transient Journal?nonono
ABORT possible?noyes (fast)yes (fast)
Rollback during system restart?noyes (fast)yes (fast)
LOCK on source table

exclusive

read

read

Spoolspace used

no

yes, same as source

no

Additional Permspace used

low, 2 cylinders per AMP

high, same as source

high, same as source

Works on a table copy?noyesyes
Must Create/Drop/Rename Table?

no

yes

yes

Must recreate
Secondary/Hash/Join Indexes

Foreign Keys/Statistics/Comments
Access Rights?
noyesyes
Supports changing
Primary Index/Partitioning?
noyesyes

 

You can easily spot that InsSel and Merge are quite similar, but Alter is usually different.

The only common ground is the Transient Journal, all three don't use it (of course there are some entries indicating there some work going on, but the actual rows are not journaled). Due to that fact InsSel and Merge can be easily aborted and will rollback quite fast (just deleting all rows in the target table), but once Alter started it must finish, there's no way to abort it. Even a system shutdown can't stop it, it will simply continue after the restart. Some will consider this as positive others as negative :-)

The most important difference is the availability during the restructure process: Both InsSel and Merge apply a read lock allowing concurrent read access while Alter needs an exclusive lock blocking any access to the target table. That's the main reason why Alter is not used in most environments. Additionally before TD13 there was a table level write lock on dbc.AccessRights which was held throughout the whole process easily blocking other sessions. Yet in current releases this lock duration has been greatly reduced, now other requests will only be blocked for a short period. Some additional RowHash locks on system tables usually don't interfere with other requests, but might block backups.

Both Alter and Merge don't use Spool, Alter moves block on a cylinder level and Merge directly merges the source rows into the target table. But InsSel always needs to spool the source data, of course this is especially bad for large tables when explain shows "The result spool file will not be cached in memory".

Keeping a copy of the original table is often regarded as an advantage of Merge and InsSel ("just in case"), but when you're constraint on permspace you might prefer Alter's low overhead of a few megabytes per AMP.

However the biggest advantage of Alter is its simplicity, just submit "ALTER TABLE tab ADD new_col int, ADD existing_col COPRESS ('bla');", that's it.

Compare this to all the additional steps needed for InsSel or Merge. It's not only CREATE/DROP/RENAME, all those COMMENTs, GRANTs, COLLECT STATS must be scripted before and then reapplied, too. Maintaining Referential Integrity might be complicated when the table is referenced in a Foreign Key. And to speed up processing the target table will be created with the Primary Index only, any additional index must be recreated subsequently.

Resource usage and runtime

I'm not showing exact number because your mileage may vary, but for tables without secondary indexes the CPU/IO scoring is usually:

  1. Alter Table
  2. Merge Into
  3. Insert Select

In my test cases Merge needed almost twice the CPU and IO of an Alter and InsSel added another 20%.

When Secondary/Hash/Join indexes exist InsSel gets closer to Merge but the gap to Alter increases drastically: Alter still needs to modify only the base rows instead of re-building all the indexes.

Runtime differences should be similar to CPU/IO, but they will vary greatly amongst systems due to different bottlenecks and you should run some tests on your own system.

Conclusion

I would strongly recommend implementing Alter Table, at least start considering it. If you're concerned about availability you should bear in mind that this process will probably be scheduled out of business hours anyway.

And when you need to change the [P]PI or you just want the safeness of a copy of the old table you should definitely prefer Merge Into over good ol'Insert Select.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

New StatsInfo query for TD14

$
0
0
AttachmentSize
stats_td14_20130830.zip212.2 KB

Beginning with TD14 statistics are no longer stored in dbc.TVFields and dbc.Indexes, they have been moved into dbc.StatsTbl to facilitate several enhancements. A new view dbc.StatsV returns much of the information previously extracted in my StatsInfo query.

But of course this is still not enough information, at least not for me ;-)

 dbc.StatsV vs. StatsInfo TD13/14

I tried to get the best of both worlds and so i wrote a new version for TD14 to extract as much additional data as possible. I had to remove a few columns (they're no longer needed or it's no longer possible to get that info) and i renamed some to match the new names in dbc.StatsV. Most of the new columns were simply not available before TD14.

Following table describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV:

dbc.StatsV TD14

New StatsInfo TD14

StatsInfo TD13.10

Remarks

Description

 

(Column added/removed/changed)

 

 

DatabaseName

DatabaseName

DatabaseName

 

 

TableName

TableName

TableName

 

 

ColumnName

ColumnName

ColumnName

StatsId <> 0

List of comma-separated column names

FieldIdList

FieldIdList

FieldId

StatsId <> 0

List of semicolon-separated field ids

StatsName

StatsName

StatsName

StatsId <> 0

Alias name of the statistics (if assigned)

 

IndexName

IndexName

StatsId <> 0

Name of the index (if assigned)

 

DateIncluded

DateIncluded

StatsId <> 0

DATE or TIMESTAMP column included, Y/N

 

PartitionColumn

PartitionColumn

StatsId <> 0

Column included which is used in the table's partitioning expression: Y/N

 

PartitionLevels

PartitionLevels

 

Number of levels in the table's partitioning expression, zero means not partitioned

 

ColumnPartitioningLevel

 

 

Level number for the column partitioning level, > 0 indicates columnar table

 

PartitionsDefined

PartitionsDefined

 

The number of partitions defined

ExpressionCount

ExpressionCount

ColumnCount

StatsId <> 0

The number of columns in the statistics

StatsId

StatsId

 

 

StatsId = 0 → Summary Stats

StatsType

StatsType

 

 

Statistics collected on:
T → Table
I → Join Index
N → Hash Index
V → View 14.10?
Q → Query 14.10?
L → Link Row 14.10?

 

StatsTypeOld

StatsType

 

Statistics collected on:
Summ → Summary Statistic
UPI → Unique Primary Index
NUPI → Non-Unique Primary Index
USI → Unique Secundary Index
NUSI → Non-Unique Secondary Index
VOSI → Value Ordered NUSI
Part → Pseudo column PARTITION
Col → Single Column
MCol → Multiple columns

 

TableType

TableType

 

TempTbl → Global Temporary Table
Tbl → Table
JoinIdx → Join Index
HashIdx → Hash Index
NoPITbl → No Primary Index Table

StatsSource

StatsSource

 

 

The method this statistic is acquired:
I → Internally generated
S → User collected with COLLECT STATS (system built)
U → User collected with COLLECT STATS VALUES clause
C → Copied from other sources
T → Transferred with CREATE TABLE...AS statement

ValidStats

ValidStats

 

 

TD14.10: Indicates whether the statistics are valid or not: Y/N

DBSVersion

DBSVersion

 

 

Database version statistics collected on

SampleSizePct

SampleSizePct

SampleSize

StatsId <> 0

Sample size used for collect stats, NULL if not sampled

SampleSignature

SampleSignature

 

StatsId <> 0

Sample option encoded as a 10 character signature
USPnone → collected using NO SAMPLE
USP00nn.00 → collected using SAMPLE nn PERCENT
SDPxxxx.xx → sample size determined by system

ThresholdSignature

ThresholdSignature

 

StatsId <> 0

THRESHOLD options encoded as a 17 character signature (not used before TD14.10)
Characters 1 to 10 → THRESHOLD PERCENT
SCTxxxx.xx → System defined
UCT005.00 → User defined 5 percent
UCTnone → User defined no threshold
Characters 11 to 17 → THRESHOLD DAYS
STTxxxx → System defined
UTT0010 → User defined 10 days
UTTnone → User defined no threshold

MaxIntervals

MaxIntervals

 

StatsId <> 0

User-specified maximum number of intervals

StatsSkipCount

StatsSkipCount

 

StatsId <> 0

TD14.10 only: How many times the statistis collection has been skipped based on the THRESHOLD

MaxValueLength

MaxValueLength

 

StatsId <> 0

User-specified maximum value length

LastCollectTimestamp

LastCollectTimestamp

CollectTimestamp

 

Date and time when statistics were last collected

 

LastCollectDate

CollectDate

 

 

LastCollectTime

CollectTime

 

RowCount

RowCount

NumRows

 

The cardinality of the table, i.e. the number of rows

UniqueValueCount

UniqueValueCount

NumValues

StatsId <> 0

Distinct Values. Estimated when sampled

PNullUniqueValueCount

PNULLUniqueValueCount

 

StatsId <> 0

Number of unique values from rows with partial NULLs (multicolumn stats)
Estimated when sampled

NullCount

NULLCount

NumNULLs

StatsId <> 0

Number of partly NULL and all NULL rows, estimated when sampled

AllNullCount

AllNULLCount

NumAllNULLs

StatsId <> 0

Number of all NULL rows (multicolumn stats), estimated when sampled

HighModeFreq

HighModeFreq

ModeFreq

StatsId <> 0

Frequency of the most common value, estimated when sampled

PNullHighModeFreq

PNULLHighModeFreq

 

StatsId <> 0

Highest frequency of values having partial NULLs (for multicolumn stats), stimated when sampled

AvgAmpRPV

AvgAmpRPV

AvgAmpRPV

StatsId <> 0

Overall average of the average rows per value from each AMP, only for NUSIs, otherwise zero

 

MinValue

MinValue

StatsId <> 0

Minimum data value (only for single column numeric or datetime stats)

 

ModalValue

ModalValue

StatsId <> 0

Most common data value (only for single column numeric or datetime stats)

 

MaxValue

MaxValue

StatsId <> 0

Maximum data value (only for single column numeric or datetime stats)

 

OneAMPSampleEst

OneAMPSampleEst

StatsId = 0

Estimated cardinality based on a single-AMP sample

 

AllAmpSampleEst

AllAmpSampleEst

StatsId = 0

Estimated cardinality based on an all-AMP sample

DelRowCount

DelRowCount

 

StatsId = 0

Deleted rows count??? used in 14.10 ???

PhyRowCount

PhyRowCount

 

StatsId = 0

Seems to be the same as AllAMPSampleEst – used in 14.10 ???

AvgRowsPerBlock

AvgRowsPerBlock

 

StatsId = 0

Average number of rows per datablock???

AvgBlockSize

AvgBlockSize

 

StatsId = 0

Average datablock size???

BLCPctCompressed

BLCPctCompressed

 

StatsId = 0

Blockcompression in percent??? used in 14.10 ???

BLCBlkUcpuCost

BLCBlkUcpuCost

 

StatsId = 0

CPU cost for Blockcompression??? used in 14.10 ???

BLCBlkURatio

BLCBlkURatio

 

StatsId = 0

??? used in 14.10 ???

AvgRowSize

AvgRowSize

 

StatsId = 0

Average record size???

Temperature

Temperature

 

StatsId = 0

populated in 14.10???

NumOfAMPs

NumOfAMPs

NumAMPs

 

The number of AMPs from which statistics were collected, usually the number of AMPs in the system, 1 for an empty table

CreateTimeStamp

CreateTimeStamp

 

 

Statistics creation timestamp

LastAlterTimeStamp

LastAlterTimeStamp

LastAlterTimeStamp

 

Different meaningLast user updated timestamp, i.e. Collect stats was submitted but skipped by optimizer due to threshold not reached

 

LastAccessTimestamp

LastAccessTimestamp

 

The last time this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV

 

AccessCount

AccessCount

 

How often this column/index was used in queries, the same info is found in dbc.TablesV and dbc.IndicesV

 

TableId

TableId

 

To facilitate additional joins to other system tables

 

IndexNumber

IndexNumber

StatsId <> 0

Index number of the index on which statistics are collected

 

FieldType

FieldType

 

Single column stats: dbc.TVFields.FieldType, NULL for multi-column

 

Version

StatsVersion

 

Internal version of statistics:
5 → TD14
6 → TD14.10

OriginalVersion

OriginalVersion

 

StatsId <> 0

Probably version when stats were migrated from older releases, but not yet recollected
4: pre-TD14
5: TD14.00
6: TD14.10

NumOfBiasedValues

NumOfBiasedValues

 

StatsId <> 0

Number of biased values in the histogram

NumOfEHIntervals

NumOfEHIntervals

 

StatsId <> 0

Number of equal height intervals in the histogram

NumOfRecords

NumOfRecords

 

 

Number of history records in the histogram

 

CollectStatement

CollectStatement

 

COLLECT STATS statement to collect the stats.
Two versions with or without double-quoted object names

 

ShowStatement

HelpStatement

 

SHOW STATS VALUES statement to get the stats details.
Two versions with or without double-quoted object names.

 

 

MissingStats

 

Was a side-product of the old query, too much overhead to add

 

 

NumIntervals

 

Replaced by NumOfBiasedValues & NumOfEHIntervals

 

 

CollectDuration

 

Not (yet) possible, i don't know if this is stored somewhere

 

 

NumericStats

 

No longer neccessary

 

 

DataSize

 

Too much overhead to calculate, not really needed as the limitation of 16 bytes is removed in TD14

Please report any issues or obviously wrong output to dnoeth@gmx.de.


Attached files:

StatsInfo_vs_StatsV.pdf

Describes the new StatsInfo view and details the differences to the previous version and dbc.StatsV - added, modified and removed columns (same as above table)

Teradata Statistics TD14.pdf

Partial description of the new internal stats format based on some reverse engeneering of the binary data stored in a BLOB in dbc.StatsTbl.Histogram. Luckily the internal storage maps almost 1:1 to the output of a SHOW STATISTICS VALUES :-)

stats_td14_yyyymmdd.sql

StatsInfo source code. To keep the code clean it's based on SQL-UDFs

ReverseBytes.sql ReverseBytes.c

Can be used to replace the ReverseBytes SQL-UDF with a C-UDF which uses way less CPU (but most DBA's don't like C).
Note: I'm not a C programmer, but this was so basic even i 

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

.NET Data Provider for Teradata: Creating Function Where the Source Resides On the Client Machine

$
0
0

The Teradata Database supports source files that can reside on the client or server machine that contains Data Definition Language (DDL) to create a function (e.g. stored procedure, user defined function, or user defined type).  The DDL syntax specifies where the file resides and the path to the file.  An example of a DDL to create a stored procedure from a file that resides on the Teradata Database Server is as follows:

CREATE PROCEDURE provider_sp(INOUT region VARCHAR(64)) 
LANGUAGE C NO SQL EXTERNAL NAME 'SS!provider_sp!/usr/teradata/xsp.c!F!provider_sp' 
PARAMETER STYLE SQL

The .Net Data Provider for Teradata 14.10 and earlier releases only supported source files that resided on the server.  An exception is thrown if the DDL specified that the source file resided on the client machine.

This changes with the 14.11 release of the Teradata Provider.  Beginning with this release the Teradata Provider can support  DDL statements that specify the source file resides on the client machine.  A DDL statement that specifies this is similar to the following:

CREATE PROCEDURE provider_sp(INOUT region VARCHAR(64)) 
LANGUAGE C NO SQL EXTERNAL NAME 'CS!provider_sp!c:\xsp.c!F!provider_sp' 
PARAMETER STYLE SQL

In order to support this feature several new types have been added to the Teradata Provider:

TdOpenFileEventHandler

Delegate that is used to register to the TdConnection.OpenFile event.

TdCloseFileEventHandler

Delegate that is used to register to the TdConnection.CloseFile event.

TdConnection.OpenFile

Event that is raised when the Teradata Database requests the source of the function.   A delegate must be registered with this event if a DDL is executed that specifies that the source of the function resides on the client's machine.  Only one delegate can be registered with this event.

TdConnection.CloseFile

Event that is raised when the contents of the source file has been sent to Teradata.  Registration of this event is optional.  Only one delegate can be registered with this event.

 TdFileEventArgs

Class that contains information about the source file.   The TdOpenFileEventHandler  and TdCloseFileEventHandler delegates requires a parameter declared as this type.

When the DDL is executed, the Teradata Provider will send the source code of the function to the Teradata Database by performing the following tasks:

  1. Raise the TdConnection.OpenFile event. 

    The TdFileEventArgs parameter of the delegate that is invoked by this event will contain the name of the file that was specified in the DDL statement.  It is the responsibility of the application to retrieve the name of the file from the TdFileEventArgs.FileName property and open a System.IO.Stream to the file.  The application must also set the TdFileEventArgs.SourceFile property to the stream.

  2. Retrieve the Stream object from the TdFileEventArgs.SourceFile property. 

    The Teradata Provider will read the source from the Stream and send it to the Teradata Database.

  3. Raise the TdConnection.CloseFile event after all the source has been read and sent to Teradata.

    It is the application's responsibility to perform any cleanup tasks (e.g. closing the Stream) in the delegate that was registered with this event.

The following coding example shows how to use the new types when executing a DDL to create a function and specifies that the source file resides on the client machine:

/// <summary>        
/// Class that contains the File Handlers that are invoked when the Teradata Provider
/// requests the source file of the function and completes processing the source code.
/// </summary>
public class ExternalFileHander
{
    /// <summary>
    /// Delegate that is invoked when the TdConnection.OpenFile 
    /// event gets raised by the Teradata Provider
    ///</summary>
    public void OnFile(Object sender, TdFileEventArgs eventArgs)
    {
        // retrieving the file name from the TdFileEventArgs parameter
        String FileName = eventArgs.FileName;
       
         // creating stream and sending stream back to Teradata Provider
         eventArgs.ExternalFileObject =  new FileStream(FileName, FileMode.Open);
     }
        
    /// <summary>
    /// Delegate that is invoked when the TdConnection.CloseFile
    /// event gets invoked by the Teradata Provider.
    /// </summary>
    public void OnCloseFile(Object sender, TdFileEventArgs eventArgs)
    {
        Stream sourceStream = eventArgs.ExternalFileObject;

        sourceStream.Close();
    }
}



/// <summary>
/// Method that executes a DDL to create a stored procedure.  It demonstrates how to
/// use the new objects to create a stored procedure where the source file resides on the
/// client machine.
/// </summary>
public void CreateExternalProcedure(TdConnection cn)
{
    TdCommand cmd = cn.CreateCommand();
 
    // Creating the handler that contains the delegates that will be used by the 
    // ExternalFile and ExternalFileReadCompleted events
    ExternalFileHandler fh = new ExternalFileHandler();
           
    cmd.CommandText = "CREATE PROCEDURE provider_sp(INOUT region VARCHAR(64)) " +
        "LANGUAGE C NO SQL EXTERNAL NAME 'CS!provider_sp!c:\xsp.c!F!provider_sp'" +
        "PARAMETER STYLE SQL";
            
    // creating delegates for the OpenFile and CloseFile events
    TdOpenFileEventHandler extFile = new TdOpenFileEventHandler(fh.OnFile);

    TdCloseFileEventHandler extReadCompleted = 
        new TdCloseFileEventHandler(fh.OnCloseFile);
        
    // this event will get raised when Teradata requests the source code of the 
    // stored procedure.  Registration to this event is required if a DDL is executed
    // that specifies data is to be read from a source file that resides on the client
    // machine.
    cn.OpenFile += extFile;
              
    // this event will get raised after Teradata has received all the source code 
    // for the stored procedure.  Registering with this event is optional.
    cn.CloseFile += extReadCompleted;
             
    try
    {
         // going to read through the compiler messages sent from Teradata
        using (TdDataReader dr = cmd.ExecuteReader())
        {
            String result;

            while (dr.Read() == true)
            {
                result = dr.GetString(0);
                     
                Console.WriteLine(result);
            }
        }
    }
    finally
    {
        // need to unregister from the events.
        cnOpenFile -= extFile;
        cn.CloseFile -= extReadCompleted;
    }
} 

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

.NET Data Provider for Teradata: Sending and Receiving XML

$
0
0
Short teaser: 
This blog describes how to send and receive XML data from a Teradata Database 14.10 or later release

The XML data type was released in Teradata Database 14.10.  This type enables XML documents and fragments to be stored in a column defined as XML.  Support for the XML type was also added to the  .NET Data Provider for Teradata 14.10 release. 

TdXml is the provider specific type used to support XML by the Data Provider.   It is used to retrieve XML documents and fragments from a Teradata Database.  A TdXml instance is created by calling TdDataReader.GetTdXml() on the XML column.  After the instance is created, the method TdXml.CreateXmlReader() is called to create an XmlReader instance.  The XmlReader is used to traverse the XML document or fragment.

 Here is an example of retrieving an XML document:

Public void RetrieveXmlExample(TdCommand cmd)
{
    cmd.Parameters.Clear();

    cmd.CommandText = "select XmlColumn1 from TableExample";

    using (TdDataReader reader = cmd.ExecuteReader())
    {
        // read a row returned from Teradata
        while (reader.Read() == true)
        {
             // retrieve the Xml data
             //      with the Teradata Database.
             using (TdXml tXml = reader.GetTdXml(0))
             {
                 // Creating an XmlReader so that the Xml can be traversed
                 using (XmlReader xml = tXml.CreateXmlReader())
                 {
                     //
                     // traversing of the Xml is performed by making call to the XmlReader methods.
                     //
                 }               
             }
         }
    }
}

It is important that the TdXml instance (tXml) gets disposed after all the data has been processed.  By disposing this instance, all the Teradata resources that were allocated by the Data Provider are released.  It is the responsibility of the application to dispose the XmlReader instance that was returned from the TdXml.CreateXmlReader ().

XML can be sent to a Teradata Database (when inserting and updating) using an XmlReader or TextReader object.  The following is an example that uses an XmlReader to insert data:

Public void InsertXml(TdCommand cmd)
{
    cmd.Parameters.Clear();

    cmd.CommandText = "INSERT INTO TableExample (colInteger, colXml1, colXml2) values (?, ?, ?)";

    // creating the first parameter
    cmd.Parameters.Add(null, TdType.Integer);
    cmd.Parameters[0].Direction = Parameter.Direction.Input;
    cmd.Parameters[0].Value = 100;

    // creating the Xml parameter -- an XmlReader is used
    // XmlTextReader inherits from XmlReader
    XmlTextReader xmlReader = new XmlTextReader(@"c:\XmlDocFile1.xml");
    cmd.Parameters.Add(null. TdType.Xml);
    cmd.Parameters[1].Direction = ParameterDirection.Input;
    cmd.Parameters[1].Value = xmlReader;

    // creating the Xml parameter -- an TextReader is used.
    // StreamReader inherits from TextReader
    StreamReader xmlStream =  new StreamReader(@"c:\XmlDocFile2.xml");
    cmd.Parameters.Add(null. TdType.Xml);
    cmd.Parameters[2].Direction = ParameterDirection.Input;
    cmd.Parameters[2].Value = xmlStream;

    try
    {
        cmd.ExecuteNonQuery() ;
    }
    finally
    {
        xmlReader.Close();
        xmlStream.Close();
    }
}

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Automatic Throttles in Teradata 14.0 and 14.10

$
0
0

I just returned from the 2013 Teradata user group Partners Conference in Dallas.  One of the technical topics that I presented at the conference was throttle rules and how they work.  Throttles provide concurrency control in TASM, both on the EDW and the Appliance platforms.   I'd like to share few points about enhancements to throttles in Teradata 14.0 and 14.10 that I discussed at the conference,  in particular automatic throttles.

By the way, I'd like to thank all of you that were at the Partners Conference and who came up and introduced yourself and passed on positive feedback about my blog postings in Developer Exchange.  It was very encouraging for me to hear from so many of you who are reading these comments and finding value in them.   

Concurrency control, whether for queries or utility jobs, is enforced in Teradata by means of throttles.  Throttles are rules within the database, and for the most part are available to all the Teradata family platforms.  For example, both the 1xxx and 2xxx platforms as well as the 5xxx and 5xxx systems benefit from having system throttles and utilility throttles available on both SLES 10 and SLES 11 operating systems.  In addition, once you get to SLES 11, Appliance platforms will have both workload throttles and group throttles available, which previously have only been available on the EDW. 

There are several new capabilities in the concurrency control area when you get to Teradata 14.0 or 14.10, and this includes several throttles rules that are automatically created for you. This posting explores these automatic throttle rules.

Automatic Throttles on the Appliance Platform

Starring in 14.0, if you are on the 1xxx or 2xxx series hardware (the Appliance platform) on either SLES 10 or SLES 11, you will have two automatic throttles that you can see in your Workload Designer screens:

  • GeneralQuery throttle places a limit of 52 on concurrent requests systemwide.
  • OneSecondQuery throttle has a limit of 30 for all Low, Medium or High queries that have an estimated time greater than 1 second.

The purpose of these throttles is to give you a sensible starting point for managing concurrency on your platform, even if you are unfamiliar with workload management techniques.  These default throttle rules are created during the migration process that takes you to the 14.0, or the 14.10 release.  These are default rules, but can be modified or even deleted if you wish.

These default rules are not included on the EDW platforms.

One approach to modifying these two default rules on the Appliance is to limit the impact of the GeneralQuery throttle, so single and few AMP queries can always run without being delayed.  Follow these steps to accomplish that modification:

  • Clone FirstConfig
  • Edit Cloned Ruleset
  • Click on Throttles Category
  • Click on GeneralQuery
  • Click on Classification Tab
  • Choose Query Characteristics
  • Click on Add
  • Choose AMP Limits
  • Include only queries that use all AMPs
  • OK
  • Save

Automatic Throttle to Manage SQL-H queries

A new feature in 14.10 is an automatic throttle that focuses on requests that are using SQL functions used for Hadoop communication.  These are referred to as SQL-H queries.  If a throttle rule already exists in the rule set and it limits SQL-H queries to no more than 20 at a time, no action will be taken.  However, if no such throttle exists, then an automatic throttle with a limit of 2 is added to the rule set.  This applies to both EDW and Appliance platforms.

The purpose of this automatic throttle is to prevent over-use of memory that is possible if a large number of these functions run at the same time.  

The throttle is created at the time the rule set is activated, but only in the cases where there is not already a rule in place to manage these types of functions.   The ability to classify to workloads based on function name is a new feature in 14.10, so the migration to routine looks for throttles based on Hadoop communication functions, and only if it does not find it does it create the automatic throttle.

This automatic throttle is not recorded in the TDWM database and will not be visible in Viewpoint.  It is only present in the TDWM rules cache.  But it can be seen in tdwmdmp or using the PMPC APIs.

It’s rule name is Default_load_from_hcatalog, and it carries a rule ID of 800000.  It cannot be deleted.

Conclusion

Automatic throttles are part of the evolution of workload management in Teradata.   In current software, there are several attempts to make set up simpler for you and protect you from things you may not have thought of .  Automatic throttles are one example of that direction.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

How to use JDBC PreparedStatement batch INSERT with R

$
0
0

The following shows how to batch 2 rows of data, then invoke JDBC PreparedStatement.executeBatch to INSERT them into a database table. JDBC PreparedStatement.executeBatch returns an array of integers that must be checked to confirm all batched rows have been inserted successfully.

JDBC FastLoad can be used by changing con = dbConnect(drv,"jdbc:teradata://YourSystem/TYPE=FASTLOAD","guest","please").

 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Visual Design is an Important Part of the User Experience

$
0
0
Short teaser: 
A user experience without well thought visual design is not a very good experience.
Cover Image: 

As an interaction designer working on Viewpoint, it is my job to take data and turn it into useful software interfaces. Who will use it? What do they need to accomplish? How often do they do it? How will they know when they have accomplished their goal? These are some of the many questions that need to be answered as I start with product requirements and use research to define how software should behave in order to make it as intuitive as possible. There is some subjectivity to the process, but the goal is to use data in order to provide an objective, logical design that meets customer need, not my interpretation of what customers should have.

But, there is another aspect of user experience design that can be a bit murky – visual design. Ask five people what their favorite color is and you can get 5 different answers. Does the layout look right? What is the best font? There are many design aspects that elicit strong opinions. Luckily, there are some design heuristics and principles of visual perception that can be used to reduce subjectivity.

Turning More into Less

For complex software, grouping and labeling data is an important way to focus the user on what matters. Just as a dictionary is indexed so that you can quickly jump to a relevant section of data, software interfaces can be chunked so users can first focus on a small set of groups, and then determine whether it is necessary to look at the details of a group. For example, if you have data for 3 systems on a screen and all of the data visually runs together it can take some time to visually parse all of the information and learn the structure of the data. If the data is visually grouped using easily differentiable headings, you can scan the three headings for the one system you care about before paying attention to the details, quickly decreasing the amount of information you parse by two-thirds.

Hey, Look at Me!

How old is your monitor? How old are you? Designers should always be mindful that not all users will have the same monitor nor the same visual acuity. Creating an interface with overly subtle differentiation in the color of interface elements can be problematic for users who cannot see that level of subtleness on their monitors. For example, using a very light gray on a white background may not have the intended effect for some users.

Even when bolder colors are used, the styles used to differentiate content are important to ensure that users can focus their attention. If headings look very similar to regular content, the headings become less useful as grouping elements. Interactive elements should generally stand out more than static data and should not look overly muted or disabled. Grouping styles should clearly delineate chunks of information.

Lowering the Learning Curve

One of the best ways to reduce the learning curve for software is to be consistent. It takes time to learn how to use software with complexity and frequency of use being important factors. If users can learn general patterns rather than parsing all data, they can once again eliminate some of the noise and spend less time memorizing. What you don't want is for people to have to think about what something should look like on a per-screen basis because it distracts from the task they are trying to achieve.

Headings are a good candidate for consistency because the user can look for the same visual cue to delineate chunks of data. Having a consistent visual cue for interactive/clickable objects is a good way to help users quickly scan a screen to see what tasks can be performed without needing to focus on non-interactive data. Using a consistent, differentiated look for selected items helps users quickly identify the context of what they are looking at and to differentiate between static and interactive data. Placing similar actions in a consistent place helps develop muscle memory so that navigation becomes second nature. Even using consistent capitalization and font styling rules can help users by eliminating differences from the norm that may catch a users attention and cause them to question whether the differences have some meaning or are intentionally placing emphasis.

As a final note on consistency, be sure to use it only when it makes sense. I sometimes see examples where consistency is applied but it actually makes software less usable. For example, consistently showing a set of functions via command buttons on every screen of an application in order to preserve position when some of the functions are not applicable to some of the screens.

Know Thy User

In the end, it is all about understanding content, use cases, and your audience.

If you have a simple, single task interface like Gmail that people use frequently, you can afford to let the visual design of the interface be more subtle and implicit. People will be able to learn how to accomplish their primary goals in a short period of time due to the limited scope of the product and from the application of basic grouping.

When designing for a product like Viewpoint with its high data density and many different interfaces, explicitly focusing a user's attention on the right interface to accomplish a task is key.  Daily users will learn the interfaces and rely upon familiarity and consistency to focus on data rather than navigation. Monthly users will typically not memorize the interface and will need more visual guidance to focus attention and to help them discover the data and functions they need. All users will want to be able to look at the screen and focus on important data and the task at hand rather than spending time deciphering layout and navigation.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

Big Data: Where Are We Now?

$
0
0
Cover Image: 

When this blog started, it was based on my 2008 Teradata Partners User Conference presentation on the future data explosion and its impact on the Enterprise Data Warehouse:  “’Long Tails,’ ‘Black Swans,’ and Their Impact on EDW and AEI.”  One could foresee a definite future negative impact on the existing EDW platforms trying to support all this new volume and types of data (social, sensor, etc.).

I thought I would post here the last four design issue slides from this presentation:

Data History

  • More online or degraded online history availability
  • Cost/benefit:  difficult, so need to keep the additional cost of degraded online history down 
  • Cost/benefit analysis needs to include “what if” analysis

Data Detail

  • Need more complex data structures to support obscure sources and outlier data
  • More complex metadata and physical data modeling for access to low usage data

Varied Access:

  • Higher levels of security and access control complexity as supply chain opened to vendors
  • Mixed workload administration to meet service level goals

ETL/ELT complexity

  • Supporting unstructured text
  • Supporting external and “moving target” data sources
  • Flexibility in data structures and ETL/ELT tools

Data History

  • Off-line storage may not meet service level goals
  • Need “catch-up” mechanisms for ETL
  • Assume will fall behind and put systems in place
  • Note:  Fastload/Merge is as good as or better than ML or Stream, and supports all indexes

Fail-over capacity (Dual Active)

  • Both for recovery and for bursts of query activity

Need flexibility to generate queries to react to unknown events

  • Flexible data structures and ad hoc query tools

Event Triggers

  • Normal processing should be fully automated

Mixed workload planning for access capacity crush

  • Setup TDWM Rule Sets to be activated in advance for “capacity crush” scenario
  • Capacity planning needs to include “what if” scenarios

 

Since 2008, the technology has kept up and all of this data is being accommodated today with the varied technologies available to us.  However the vision of a single view of the data is still paramount, based on my two decades of experience with Integrated Data Warehouses for Teradata.  This overall requirement is addressed by Teradata Unified Data Architecture (UDA):  still a single view of the data, but on different platforms that most cost effectively store and process the data, with all the data accessible from a single query across a high-speed interconnect

So where are we going from here?  As always, the industry will apply the most cost effective hardware and software to the value of the data being captured.  And as the cost of the platforms steadily decrease, the amount of data being captured and stored for analysis will increase.  Data that was too costly to collect and store for analysis three years ago is now available on the web for free.

Expect the difficulty of supporting this vision to continue to be in the complexity of connecting all of the disparate data stores into a single view for the end user, be they a data scientist or a CEO.

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

A wider test case on R JDBC fastload

$
0
0
Short teaser: 
doing some bulk loads via R and JDBC

 

Hi,
 
this is an extension of the 
http://developer.teradata.com/blog/amarek/2013/11/how-to-use-jdbc-preparedstatement-batch-insert-with-r-0
blog. 
The initial idea and code example comes from amarek!
For what ever reason I was not able to post the code below with an comment. There seems to be an issue with the developer.teradata.com page.
 
Modify dim to generate more or less data. 
 
1 mio rows had been loaded in 5 min in my environment. Apply might be faster but didn't got this working right away.
 
Next challange would be to make this more generic in a way to be able to create a TD table for a given data.frame and load the data.frame afterwards ;-).
 
library(RJDBC)
################
#def functions
################
myinsert <- function(arg1,arg2){
  .jcall(ps,"V","setInt",as.integer(1),as.integer(arg1))
  .jcall(ps,"V","setString",as.integer(2),arg2)
  .jcall(ps,"V","addBatch")
}


MHmakeRandomString <- function(n=1, lenght=12)
{
  randomString <- c(1:n)                  # initialize vector
  for (i in 1:n)
  {
    randomString[i] <- paste(sample(c(0:9, letters, LETTERS),
                                    lenght, replace=TRUE),
                             collapse="")
  }
  return(randomString)
}

################
#DB Connect
################
.jaddClassPath("/MyPath/terajdbc4.jar")
.jaddClassPath("/MyPath/tdgssconfig.jar")
drv = JDBC("com.teradata.jdbc.TeraDriver","/MyPath/tdgssconfig.jar","/MyPath/terajdbc4.jar")
conn = dbConnect(drv,"jdbc:teradata://MyServer/CHARSET=UTF8,LOG=ERROR,DBS_PORT=1025,TYPE=FASTLOAD,TMODE=TERA,SESSIONS=1","user","password") 

################
#main
################

##gen test data
dim = 1000000
i = 1:dim
s = MHmakeRandomString(dim,12)

## set up table
dbSendUpdate(conn,"drop table foo;")
dbSendUpdate(conn,"create table foo (a int, b varchar(100));")

#set autocommit false
.jcall(conn@jc,"V","setAutoCommit",FALSE)
##prepare
ps = .jcall(conn@jc,"Ljava/sql/PreparedStatement;","prepareStatement","insert into foo values(?,?)")

#start time
ptm <- proc.time()

## batch insert
for(n in 1:dim){ 
  myinsert(i[[n]],s[[n]])
}
#run time
proc.time() - ptm

#apply & commit
.jcall(ps,"[I","executeBatch")
dbCommit(conn)
.jcall(ps,"V","close")
.jcall(conn@jc,"V","setAutoCommit",TRUE)

#get some sample results
dbGetQuery(conn,"select top 100 * from foo")
dbGetQuery(conn,"select count(*) from foo")

#disconnect
dbDisconnect(conn)

 

 
Ignore ancestor settings: 
0
Tags: 
Apply supersede status to children: 
0

Easing Into Using the New AutoStats Feature

$
0
0

Everyone with a Teradata Database collects statistics.  No way around it.  Good query plans rely on it, and we’ve all gotten used to it.

But starting in Teradata Database 14.10 there’s some big changes.  Numerous enhancements to the optimizer, some important database changes, a new set of APIs, an added data dictionary table, and a brand new Viewpoint portlet all combine together to produce the Automated Statistics (AutoStats) feature.  Used fully, AutoStats can identify statistics the optimizer looked for and found missing, flag unused statistics that you no longer need to collect on, prioritize and execute the stats collection statements, and organize/run your statistics collection jobs in the most efficient way possible.

There’s a lot going on and a lot of choices involved.  So what’s the easiest way to start getting familiar with AutoStats?   And what’s required with the new DBQL logging options?  If you like to build up confidence in a new feature at your own pace, this blog posting offers a few simple steps designed to introduce you to key features and benefits AutoStats.  

Sources of Information

AutoStats leverages a variety of new statistics enhancements available in Teradata Database 14.10.  Many of these new capabilities can be used directly within customized statistics management routines that already exist at many customer sites.

Reference the orange book titled:  Teradata Database 14.10 Statistics Enhancements by Rama Krishna Korlapati for more information.  Major features discussed in this orange book include automatic downgrading to sampled statistics, automatic skipping of unnecessary recollections based on thresholds, tracking update-delete-insert counts, and collecting statistics on single table expressions.

However, to gain the full benefit of an automated closed-loop architecture that incorporates these enhancements, take advantage of the statistics collection management and tuning analysis opportunities of AutoStats.  For a solid understanding of this feature, read the orange book titled  “Automated Statistics Management” by Louis Burger and Eric Scheie.   It discusses what AutoStats does, how to use the Viewpoint portlet screens, and provides guidelines on using AutoStats successfully.  

Here in Developer Exchange, take a look at the article from Shrity dated May 2013 titled “Teradata Viewpoint 14.10 Release”.   It describes and illustrates the Viewpoint portlet that supports AutoStats that you will be using when you get to 14.10.  This Viewpoint portlet is named “Stats Manager”.   

This blog posting is intended to help you ease into using the new AutoStats feature.  Subsequent related postings will explore individual features of 14.10 statistics management, such as the Threshold option, and suggest ways you can use them outside of the AutoStats architecture.

Version 6 of the Statistics Histogram

One underpinning of the Teradata Database 14.10 enhancements to statistics collection is a new version of the statistics histogram.  Version 6 is necessary to support some of these new enhancements, including a system-derived sampling option for the statistics collection statement, and  for capturing insert, delete and update counts. 

In order to use Version 6, the system needs to be committed to not back down to any Teradata Database release earlier than 14.10. The DBS Control general field #65 NoDot0Backdown must be set to true to indicate the no back down decision.  This change does not require a restart.  

If you have turned on DBQL USECOUNT logging for a database (discussed in the next section), and no back down is committed, the Version 6 histogram will automatically be used the next time you recollect statistics for the tables within that database.

The database supports backward compatibility.  Statistics whose histograms use the previous versions (Version 1 to Version 5) can be imported into the system that supports Version 6. However, the Version 6 statistics cannot be imported to systems that support Version 5 or lower versions.  After migrating to Teradata 14.10, it’s a good idea to accelerate the recollection of current statistics.  The recollection will move your histograms to the new version, which in turn will support a wider use of the new enhancements.

New DBQL options  - What They Do and When to Use Them

AutoStats functionality relies on your making more information available to the database about how statistics are used.  Some of this information it digs up itself, such as which stats you have already collected, and which options you have specified on your current statistics collection statements.

But to provide the most interesting (and helpful) new functionality, such as determining which statistics you have missed collecting on, or no longer need, additional DBQL logging will be required.  While this extra DBQL logging is not absolutely necessary if you choose to use AutoStats only for collection and management purposes, things like the new re-collection threshold functionality will not work without DBQL USECOUNT logging taking place.  AutoStats uses a “submit often” architecture, where statistics collections are submitted frequently, making the 14.10 threshold logic important for determining which statistics actually need to be refreshed at run time.

DBQL USECOUNT Logging

If you enable DBQL logging with the USECOUNT option, and then run an AutoStats Analyze job from the Stats Manager portlet, you can identify unused statistics as well as detect which statistics are being used.   Although you turn on USECOUNT by a BEGIN QUERY LOGGING statement, USECOUNT is not a normal query-based DBQL option, and it is not something you enable at the user level.  You turn USECOUNT on at the database level in a separate logging statement from the DBQL logging targeted to users.   

BEGIN QUERY LOGGING WITH USECOUNT ON SandBoxDB;

Once you enable logging, USECOUNT tracks usage of all objects for that database.  USECOUNT is the new mechanism for turning on Object Use Count (OUC).  

One type of object tracking USECOUNT does has to do with statistics.  In 14.10 each individual statistic is considered an “object” and will have a row in a new table, DBC.ObjectUsage.   With USECOUNT on, the database counts how often statistics objects were used by the optimizer.  Whenever the optimizer uses a particular statistic, the access count in that row in the DBC.ObjectUsage table is incremented.   If a particular statistics has no usage over a period of time, routines will flag it as being unused, and subsequent executions of Analyze jobs scoped to that database will recommend de-activating it.

USECOUNT also tracks inserts, deletes and updates made to tables within the database you have selected.  This provides the extrapolation process with more accurate information on changes in table size, and also allows enforcement of the Threshold Change Percent limits that you can apply to individual stats collection statements starting in 14.10.  The AutoStats feature incorporates threshold-based information in determining the ranking, and therefore the submission priority, of the individual statistic collections within the scripts it builds.

Here are few tips when logging USECOUNT:

  1. Before you turn on USECOUNT, and in order for the optimizer to use the use counts, set the DBS Control field NoDot0Backdown to true.
  2. Keep USECOUNT logging on for databases whose tables are being analyzed in an ongoing basis.
  3. In general, it is good to keep USECOUNT logging on for all important databases.  In particular, USECOUNT logging is important for those databases where the SYSTEM change-based THRESHOLD is in effect. New optimizer statistics fields in DBS Control allow the user to enable and set system-level THRESHOLD defaults.   A future blog posting will discuss the new THRESHOLD functionality in more detail. 

DBQL STATSUSAGE and XMLPLAN Logging

These additional DBQL logging options create XML documents that identify which statistics were used, and ones that the optimizer looked for but did not find.  Contrary to USECOUNT, these two logging types should be turned on temporarily, only as needed in preparation for analysis.  Both provide input to Analyze jobs whose evaluation method has been specified as  “DBQL”.   Both logging options store the XML that they produce in the DBC.DBQLXMLTbl.

They basically do the same thing.  However XMLPLAN is more granular, provides more detail, and comes with more overhead than STATSUSAGE logging.   

The difference is that STATSUSAGE logs the usage of existing statistics and recommendations for new statistics (that it found to be missing) into the DBQLXMLTbl without any relationship to the query steps.  The relationship to the query steps can only be made if XMLPLAN logging is also enabled.  XMLPLAN logs detailed step data in XML format.  If you enable XMLPLAN by itself without STATSUSAGE, there is no stats related data logged into the DBQLXMLTbl.  STATSUSAGE provides all the statistics recommendations, and if both are being logged, then those statistical recommendations can be attached to specific steps.   So if you plan to log XMLPLAN, log STATSUSAGE at the same time.   

If both types of logging are on, both options log into a single XML document for a given request.  But you must explicitly request logging for them, as the ALL option in DBQL will not include STATSUSAGE and XMLPLAN.   Logging for both are enabled at the user level, as is STEP or OBJECT logging. 

When you begin logging with these new DBQL options, take into consideration any logging that is already taking place.   In the statement below, although SQL and OBJECTs are not used by AutoStats, they may be included in the same logging statement with STATSUSAGE and XMLPLAN.

BEGIN QUERY LOGGING WITH SQL, OBJECTS, STATSUSAGE, XMLPLAN LIMIT SQLTEXT=0 ON USER1;

The orange book Teradata Database 14.10 Statistics Enhancements provides an example of the XML output captured by DBQL STATSUSAGE logging.  See page 65-67.

Here are few tips when logging STATSUSAGE and XMLPLAN:

  1. For first time tuning or analysis, or after changes to the database, enable both XMLPLAN and STATSUSAGE options for those users who access the affected tables.
  2. Log with both options for at least one week in advance of running an Analyze job, or however long it takes to get good representation for the different queries.
  3. Consider using the more detailed XMLPLAN only for focused tuning, or briefly for new applications, as it incurs some overhead and requires more space.
  4. Turn off logging of both options during non-tuning periods of time or when no Analyze jobs on affected data are being planned in the near future.
  5. Keep the scope of logging to just the users accessing the databases and applications undergoing analysis and tuning.

Note:  You can enable DBQL logging for STATSUSAGE and XMLPLAN based on account or user, or whatever DBQL logging conventions currently exist at your site.   Keep in mind the other DBQL logging rules already in place.

Becoming Familiar with AutoStats a Step at a Tme

Because AutoStats jobs can incorporate just a single table’s statistics or be an umbrella for  all of your databases, you want to keep careful track of how you scope your initial jobs.  Pick your boundaries and start off small. 

Initial steps to take when first starting out with AutoStats include:

  1. Read the orange book Automated Statistics Management  and understand how it works.
  2. Select a small, non-critical database with just a few tables, perhaps in your sandbox area, for your initial AutoStats interactions.
  3. Try to pick a database that represents a new application area that has no current statistics, or a database with some level of statistics, but whose statistics are not well tuned.
  4. Set up DBQL logging options so that new logging options are focused on the database tables you have scoped.
    1. Turn on DBQL USECOUNT logging just for that one database.
    2. Turn on DBQL logging that includes STATSUSAGE AND XMLPLAN only for users who access that database.
  5. Run the application’s queries against the database for several days or even a week.
  6. Enable the Viewpoint Stats Manager data collector in Admin->Teradata Systems portlet.  This enables the Stats Manager portlet.
  7. To run Stats Manger collect and analyze jobs, the TDStats database in DBC must have permission to collect statistics on the objects in the job’s scope.  For example: 
GRANT STATISTICS ON <database> TO TDStats;
  1. Select the database in the Statistics tab within the Viewpoint Stats Manager portlet and select the Automate command, in order to bring the existing statistics in that database under the control of AutoStats.   
  2. Create a new Analyze job
    1. Pick the DBQL option and limit queries by log date to incorporate just the last week when you were focused on logging that database.
    2. Select Require Review before applying recommendations.
  3. Run the Analyze job and review all recommendations and approve them.
  4. Create a new Collect job that incorporates your chosen database tables; run the Collect job.
    1. Specify Automatically generate the collect list (the default).
  5.  Preview the list of statistics to be collected before the collection job runs.

Starting off simply with AutoStats allows you to get familiar with the automated functionality and job scheduling capabilities, as well as the Stats Manager portlet screens.  Over time, you can give control over statistics analysis and collection to AutoStats for additional databases. 

Ignore ancestor settings: 
0
Apply supersede status to children: 
0

How to use JDBC FastExport with R

$
0
0

The following shows how to select rows from a database table using JDBC FastExport, which only works with JDBC PreparedStatement.
 

If the SELECT statement has no'?' parameter markers, then...
 

<div><pre class="brush:perl;">
con = dbConnect(drv,"jdbc:teradata://system/TYPE=FASTEXPORT","user","password")

# initialize table foo
dbSendUpdate(con,"drop table foo")
dbSendUpdate(con,"create table foo(a int,b varchar(100))")
dbSendUpdate(con,"insert into foo values(?,?)", 42, "bar1")
dbSendUpdate(con,"insert into foo values(?,?)", 43, "bar2")

# select * from table foo
ps = .jcall(con@jc, "Ljava/sql/PreparedStatement;", "prepareStatement","select * from foo")
rs = .jcall(ps, "Ljava/sql/ResultSet;", "executeQuery")
md = .jcall(rs, "Ljava/sql/ResultSetMetaData;", "getMetaData")
jr = new("JDBCResult", jr=rs, md=md, stat=ps, pull=.jnull())
fetch(jr, -1)
.jcall(rs,"V","close")
.jcall(ps,"V","close")

dbDisconnect(con)
</pre></div>

If the SELECT statement has '?' parameter markers, then...

<div><pre class="brush:perl;">
con = dbConnect(drv,"jdbc:teradata://system/TYPE=FASTEXPORT","user","password")

# initialize table foo
dbSendUpdate(con,"drop table foo")
dbSendUpdate(con,"create table foo(a int,b varchar(100))")
dbSendUpdate(con,"insert into foo values(?,?)", 42, "bar1")
dbSendUpdate(con,"insert into foo values(?,?)", 43, "bar2")

# select * from table foo with '?' parameter marker
dbGetQuery(con,"select * from foo where a=?",as.integer(43))

dbDisconnect(con)
</pre></div>

Ignore ancestor settings: 
0
Apply supersede status to children: 
0
Viewing all 136 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>