-- SQL equivalent using CASE
CASEWHENtotal_spend>=10000THEN'Premium'WHENtotal_spend>=5000THEN'Gold'WHENtotal_spend>=1000THEN'Silver'ELSE'Bronze'ENDasspending_tier
This is similar to accessing previous/next elements in an array iteration.
Recursive CTEs as While Loops
WITHRECURSIVEcountdown(val)AS(SELECT10-- Initial value
UNIONALLSELECTval-1FROMcountdownWHEREval>1-- Loop condition and iteration
)SELECT*FROMcountdown;
HAVING as Filter After Processing: Like applying conditions after a loop completes:
Traditional programming is imperative (you specify HOW to do something)
SQL is declarative (you specify WHAT you want)
The SQL engine optimizes the execution plan
Set-based Operations:
Instead of thinking about individual records, think in sets
Operations like GROUP BY process entire sets of data at once
This often leads to better performance than record-by-record processing
Composition:
CTEs allow you to break complex logic into manageable pieces
Each CTE can build upon previous ones, similar to function composition
This promotes code reusability and maintainability
UNION DISTINCT (or UNION) compares rows based on ALL columns in the corresponding positions, regardless of their names. What matters is:
Column position (order)
Data types (must be compatible)
Values in all columns
Common Pitfalls
data type mismatches
-- Might fail due to type mismatch
SELECTemployee_idFROMemployees-- integer
UNIONSELECTemployee_codeFROMcontractors;-- varchar
column order importance
-- These produce different results
SELECTfirstname,lastnameFROMemployeesUNIONSELECTlastname,firstnameFROMcontractors;
Understanding Common Table Expression (CTE)
What are CTEs?
Common Table Expressions (CTEs) are temporary named result sets that exist only within the scope of a single SQL statement. They’re defined using the WITH clause and can be thought of as “named subqueries” that make complex queries more readable and maintainable.
History
Common Table Expressions (CTEs) are part of the SQL standard
CTEs were first introduced in SQL:1999 (also known as SQL3)
Recursive CTEs were also included in this standard
The syntax using the WITH clause became part of the ANSI/ISO SQL standard
Basic Syntax
WITHcte_nameAS(-- CTE query definition
SELECTcolumn1,column2FROMtable_nameWHEREcondition)-- Main query that uses the CTE
SELECT*FROMcte_name;
Break complex queries into manageable, named components. Easy to maintain and debug
Can reference the same CTE multiple times in a query. Avoid repeating complex subqueries
Recursive calling, can reference themselves to handle hierarchical or graph-like data
WITHRECURSIVEemp_hierarchyAS(-- Base case: top-level employees (no manager)
SELECTemployee_id,name,manager_id,1aslevelFROMemployeesWHEREmanager_idISNULLUNIONALL-- Recursive case: employees with managers
SELECTe.employee_id,e.name,e.manager_id,h.level+1FROMemployeeseINNERJOINemp_hierarchyhONe.manager_id=h.employee_id)SELECTemployee_id,name,level,REPEAT(' ',level-1)||nameasorg_chartFROMemp_hierarchyORDERBYlevel,name;
Advanced Concepts
a) Materialization:
CTEs are typically materialized (computed once and stored in memory)
Can improve performance when referenced multiple times
Some databases optimize by not materializing if referenced only once
b) Scope:
CTEs are only available within the statement where they’re defined
Can’t be referenced across different statements
Can reference earlier CTEs in the same WITH clause
c) Best Practices:
Use meaningful names that describe the data or transformation
Break complex logic into smaller, focused CTEs
Consider performance implications for large datasets
Use comments to explain complex logic
Comparison with other SQL Features
CTEs vs Subqueries:
CTEs are more readable
CTEs can be reused in the same query
CTEs support recursion
CTEs can sometimes perform better due to materialization
CTEs vs Views:
CTEs are temporary and query-scoped
Views are permanent database objects
CTEs can be recursive
Views can be indexed
Analytic Functions (or Window Functions)
Analytic functions are sometimes referred to as analytic window functions or simply window functions
Basic Concept
Window functions perform calculations across a set of table rows that are somehow related to the current row. The key difference from regular aggregate functions is that window functions don’t collapse rows - each row keeps its separate identity while gaining additional computed values.
Think of adding a column that calculate base on a set of rows without collasping rows as GROUP BY does.
function_name()OVER([PARTITIONBYcolumn1,column2,...]-- create group
[ORDERBYcolumn3,column4,...]-- order inside the group
[ROWS/RANGEBETWEEN...AND...]-- widnow frame clause
)
Common Window Functions
Aggregate functions
take all of the values within the window as input and return a single value
MIN()
MAX()
AVG()
SUM()
COUNT()
Navigation functions
Assign a value based on the value in a (usually) different row than the current row.
FIRST_VALUE() (or LAST_VALUE()) - Returns the first (or last) value in the input
LEAD() (and LAG()) - Returns the value on a subsequent (or preceding) row
Numbering functions
ROW_NUMBER() - Returns the order in which rows appear in the input (starting with 1)
RANK() - All rows with the same value in the ordering column receive the same rank value, where the next row receives a rank value which increments by the number of rows with the previous rank value.
Core Concept: Window Frames
Window frames define which rows to include in the window calculation:
-- window frame cluse
ROWSBETWEEN1PRECEDINGANDCURRENTROWROWSBETWEEN1FOLLWINGANDCURRENTROWROWSBETWEENUNBOUNDEDPRECEDINGANDCURRENTROW-- no need to say AND CURRENT ROW, totaly 7 rows, current row is included
ROWSBETWEEN3PRECEDINGAND3FOLLOWING
Core Concept: Understand the OVER Clause - The Window Creator
Think of OVER as opening a “window” through which you look at your data. It’s like saying “calculate this function OVER this specific set of rows.”
Simple example:
-- Without OVER (regular SUM)
SELECTdepartment,SUM(salary)FROMemployeesGROUPBYdepartment;-- With OVER (window function)
SELECTemployee_name,department,salary,SUM(salary)OVER()astotal_salaryFROMemployees;
Key difference:
Without OVER: You get one row per department
With OVER: You keep ALL rows, but each row shows the total salary
name department salary company_total
John IT 60000 250000
Mary IT 70000 250000
Steve HR 55000 250000
Jane HR 65000 250000
Every row shows the company total (250000) while keeping individual salary information.
Core Concept: PARTITION BY - Creating Groups
PARTITION BY divides your data into groups (partitions) for the window function to operate on separately. It’s similar to GROUP BY, but it doesn’t collapse the rows.
Think of it like this:
GROUP BY: Combines rows into one summary row
PARTITION BY: Keeps all rows but performs calculations within each group
name department salary dept_total
John IT 60000 130000 -- Sum of IT salaries
Mary IT 70000 130000 -- Sum of IT salaries
Steve HR 55000 120000 -- Sum of HR salaries
Jane HR 65000 120000 -- Sum of HR salaries
Let’s break down what happened:
PARTITION BY department split the data into two groups: IT and HR
SUM(salary) was calculated separately for each group
Each employee still has their own row, but now shows their department’s total
Visual Explanation
Let’s imagine a more visual example with sales data:
salesperson sale_amount total_sales
Tom 100 700 -- Company total
Tom 150 700 -- Company total
Lisa 200 700 -- Company total
Lisa 250 700 -- Company total
salespersonsale_datesale_amountrunning_totalTom2024-01-01100100-- First day
Tom2024-01-02150250-- First + Second day
Lisa2024-01-01200200-- First day
Lisa2024-01-02250450-- First + Second day
ORDER BY in OVER Clause is often used to creating a running total report
Key Points to Remember:
OVER() without PARTITION BY: Calculates across ALL rows
OVER(PARTITION BY x): Calculates separately for each group of x
You always keep all your original rows
Each row can show both its individual value AND the calculation result
month sales running_total_by_amount running_total_by_month
January 1000 1000 1000
March 1200 2200 2500 -- Jan + Feb
February 1500 3700 4200 -- Jan + Feb + Mar
April 1800 5500 5500 -- All months
month sales previous_month difference
January 1000 NULL NULL
February 1500 1000 500
March 1200 1500 -300
April 1800 1200 600
Calculate with Previous row
prev_break column shows the length of the break (in minutes) that the driver had before each trip started (this corresponds to the time between trip_start_timestamp of the current trip and trip_end_timestamp of the previous trip).
Always specify ORDER BY in window functions when using aggregate functions
Use appropriate indexes for columns in PARTITION BY and ORDER BY
Be careful with frame specifications as they can impact performance
Test with small datasets first
Why window functions were invented
Historical Context:
Before window functions (prior to SQL:2003), developers faced several challenges:
The GROUP BY Dilemma
You either got detail rows OR aggregated results, never both
Had to use complex subqueries or self-joins for analytics
Example problem: “Show each sale AND its percentage of total sales”
Complex Analytics Requirements
Business users needed running totals, rankings, moving averages
Solutions were inefficient and hard to maintain
Required multiple passes over the data
Date function
-- on that day
WHEREDATE(creation_date)='2019-01-01'-- between days
WHEREcreation_date>='2019-01-01'ANDcreation_date<'2019-02-01'--- on that year
WHEREEXTRACT(YEARfromcreation_date)='2019'