Tuesday 10 March 2015

SQL query to find Distribution key of all tables in a Netezza Database

Run the following command in your connected database.
SELECT tab.tablename ,COALESCE(map.attname,'RANDOM') AS distribution_key       ,map.distseqno AS dist_seq_no FROM _v_table tab  LEFT OUTER JOIN _v_table_dist_map map  ON map.objid = tab.objid WHERE UPPER(objtype) IN ('TABLE','SECURE TABLE') ORDER BY tab.tablename, map.distseqno; 
    Sample output:

  TABLENAME            | DISTRIBUTION_KEY | DIST_SEQ_NO
----------------------------+-----------------------------+--------------------
 CBT_CUSTOMER     | CUSTOMER_ID            |           1
 CUSTOMER               | CUSTOMER_ID            |           1
 EMP_DETAILS          | DEPT_ID                        |           1
 MYTAB                      | ID                                     |           1
 PAYMENT                 | CUSTOMER_ID              |           1
 SALES_TAB              | CITY                                |           1
 SAMPLETABLE        | YEAR                              |           1
 TBL                             | I                                       |           1

After Looking at the output the question may arise, is there a way to tell if the distribution key was user picked or system picked? It means that by default if no distribution was explicitly specified by user Netezza would pick first column of the table as distribution key.
  Then here is the answer:          
The DIST_SEQ_NO is the order that columns were specified as part of the key if you have a multi-column distribution key. For single column distribution (which is what we'd recommend generally) this will just be 1.
Once a table is created we do not store the original SQL used to create it.. and becasue of this we don't have a way to tell what was user specified and what was system-picked to the best of my knowledge.
Just to clarify, the system may not always choose the first column as a distribution key if it's not specified.. it varies depending on how you're creating the table (DDL or CTAS) and the value of the parameter enable_random_table_distribute. The best idea is to always specify a key.

2 comments:

  1. The "SQL Query to Find Distribution Key of" article is a commendable guide in database optimization. How Play Games This resource equips readers with practical skills to enhance data management and streamline database operations.

    ReplyDelete