Secondary index
Secondary indexes in teradata is used to provide an alternate path to the data and should be used on queries that run many times.
These are two types:-
· Unique secondary index (USI)
· Non Unique secondary index (NUSI)
Unique Secondary Index
Syntax of the Unique Secondary Index:-
SI can be created even after table is populated with the data , unlike Primary Index which is created only at the time of creation of table .we can create and drop secondary index at any time.
CREATE UNIQUE INDEX(COLUMN/COLUMNS)ON<dbname>.<table name>
Whenever we create a SI on the table,Teradata will create a subtable on all AMPS . This subtable contains three columns given below
· 1.Secondary index value
· 2.Secondary index row ID (This is the hash value of SI value)
· 3.Base table row ID (This is the actual base row id)
USI sub table creation:-
· When we defined a Unique SI on the table, then the teradata will immediately create a USI subtable in each AMP for that particular table.
· Creation of Subtable requires PERM space.so always be wise to choose your SI,Normally the best SI is that columns which is mostly used in the WHERE clause.
Suppose we have employee table(base table)having attributes EMP.DEPT FNAME,LNAME and Soc_security.We defined USI on the column soc_security.The SI subtable created on each AMP which holds information about the SI column and corresponding base row id(base table row_id)which is the row id of the actual employee table. The steps involved to load this subtable is as follows:-
· Teradata will first create the subtable on all amp.
· After that it hashes the value of this USI column (soc_security) and based on that hashed value it checks the hash map for the amp number which will hold this USI value in its subtable.
PLEASE LOOK INTO THE IMAGE BELOW:-
· After getting the respective amp number, the SI value along with the two or more attributes (secondary index row id and base table row id) will be stored in the sub table of that amp.
As it is clear now that defining SI will require the creation of subtable,so we should be aware that requires space cost factor on our Teradata system.
Teradata retrieval of USI query:-
Suppose on the above example we make a query.
Select*from employee_table shere soc_security=”123-99-8888”;
When a TD optimizer find a USI in where clause it knows that it’s a 2 amp operation and also only one row will be returned. so the step its performs for retrieval is as follows:-
· It will hash the value of SI(‘123-99-8888’)by hashing algorithm and found the hash value for it.
· Now it checks this hash value in the hash map and gets the amp number from it.we know that this amp stores this SI value.
· Now it will go to the employee subtable of the amp and retrieve the base row id which is stored for that hash value.
· This base row id will be sent back to optimizer by BYNET
· Now optimizer sent back this row id again and fetch the resultant row from the base table for which soc_security=’123-99-8888’.
As we have seen that Teradata system requires 2amp to each the answer row that’s why we called USI operation as the 2 amp operations.Even if SI row resides in the same amp in which base row reside,still after getting base row id from the subtable it will sent back to optimizer so that it start search again based on that base row id.so it’s always called as the 2 amp operation.
Non Unique Secondary Index (NUSI) :
NUSI SUBTABLE EXAMPLE:-
When we defined a NUSI on the table then Teradata will build the subtable on each amp in the same fashion as that in USI.The only difference in the subtable creation is that,instead of building subtable on each amp it will be build the subtable in it to points it own base rows,In other word each NUSI subtable will reflect and points to the those base rows only which it owns.
Suppose we have an employee table(base table)on which we defined NUSI on the column frame.
· Teradata will first create the subtable on all amp.
· Each AMP will hold the secondary index values for their rows in the base table only.in our examples,each amp holds the FNAME column for all employee rows in the base table on their amp(amp local).
· Each AMP local fname will have the base table row id so the amp can retrieve it quickly if needed.if an amp contains duplicate first names,only one subtable row for that name is built with multiple base row _id’s.the above example for f_name =’john’the subtable holds multiple base row id for this value.
Teradata retrieval of NUSI query:-
Suppose on the above example we make query-
Select*from employee table where Fname=’john’ ;
When an NUSI (Fname ) is used in the where clause of an sql statement,the PE optimizer recognizes the NUSI.it will perform an all amp operation to look in to the subtable for the requested value.so the steps its performs for retrieval is as follows
· It will hash the value of NUSI(‘john’),by hashing algorithm and found the hash value for it.
· Now it will instruct all amp to look for this hash value in its employee subtable. Unlike NUSI there is no looking in to hash map because each subtable in the amp contains rows from its own base rows only.so this look up on hash value will performed on all AMP subtable.
· By amp which does not have this hash value will not participate any more in the operation.
· when the hash value found the corresponding base row id will be fetched from the subtable and send to optimizer for actual retrieval of rows.
SECONDARY INDEX SUMMARY:-
· We can have up to 32 SI for a table
· USI subtables are hash distributed
· NUSI subtables are AMP local
· USI queries are two amp operations
· NUSI queries are all amp operations, but not full table scans.
· Always collect statistics on all NUSI indexes.