Quick Tips for Performance Tuning and best practices in Bigquery

Simple in Nature: (Reduction in data being processed)

·         Projections : Select only required columns.Use Except Keyword in Select *

·         Selections: Select Only required Rows with Where Clause . Use Limit clause for data analysis

·         Filter using Pseudo Partition Column (_PARTITIONDATE)

Medium in nature :

·         Partitioned Tables

·         Reducing data before join

·         Where clause  : Operations on BOOLINTFLOAT, and DATE columns are typically faster than operations on STRING or BYTE

Complex  in Nature :

·         Reuse the repeatedly used transformations

·         Avoid multiple usage of CTE ( Common table expression )

·         Avoid Repeated Joins and Subqueries

Heavy In Nature :

·         Split complex queries into smaller ones

·         Materializing Large Datasets and use in necessary places

·         Optimize your join patterns (place the table with the largest number of rows first, followed by the table with the fewest rows, and then place the remaining tables by decreasing size.)

Minimize Data Skew in below two instances by using Concepts of bigquery

·         Skew at Partition level: Partition skew refers to when data is not evenly distributed across partitions, resulting in some partitions being larger than others. This imbalance causes more data to be processed in certain slots, leading to inefficiencies. Unequally sized partitions exacerbate this issue, creating a disparity in data distribution.

·         Skew at Join level: Data skew can happen when you use JOIN clauses. This means that when Big Query organizes data for joining, it may put too much data with the same joining key into one group. This can overwhelm the system's capacity.