HomeAboutPostsTagsProjectsRSS

Updated
Words227
TagsRead1 minute

TIL Python use the integer itself as the hash value, except for -1. hash value for -1 is -2.

# For ordinary integers, the hash value is simply the integer itself (unless it's -1).
class int:
    def hash_(self):

        value = self
        if value == -1:
            value == -2
        return value

source

Updated
Words154
TagsRead1 minute

Auto-venv is a Fish shell script that automatically activates and deactivates Python virtual environments when entering/leaving directory that contains virtual environment.

Recently, I added multiple enhancements compare to the upstream version, now it handles edge cases more gracefully:

  • It safely manages virtual environment inheritance in new shell sessions.
  • It prevents shell exits during the activation and deactivation processes.

Updated
Words622
TagsRead2 minutes

A raycast script command to start lowfi

I recently create a Raycast script command to start the lowfi command effortlessly, so I can enjoy the lowfi music in no time and control it just using keyboard.

Here is the gist , just place the lowfi.sh to the Raycast Script Directory . Run it the first time, a wezterm window will be created if lowfi process isn’t running, run it the second time, the wezterm window will be brought to the front.

Problem of using /opt/hombrew/bin/wezterm

While i thought it was a simple task, it tooks me 1 hour to finished it. I encountered a totally unexpected problem: osascript can’t control the wezterm window that running lowfi process.

I installed wezterm using homebrew cask, when launching WezTerm using the Homebrew-installed binary (/opt/homebrew/bin/wezterm), the window was created with a NULL bundleID, which made it impossible for AppleScript/System Events to properly control it. This is because the Homebrew version doesn’t properly register itself with macOS’s window management system.

I was only able to debug this problem thanks to the amazing aerospace command

aerospace debug-windows

The solution was to always use the full application path /Applications/WezTerm.app/Contents/MacOS/wezterm for all WezTerm operations, ensuring proper window management integration with macOS.

When launching WezTerm using the full application path (/Applications/WezTerm.app/Contents/MacOS/wezterm), the window is created with the proper bundleID com.github.wez.wezterm. This allows:

  1. System Events to properly identify and control the window
  2. AppleScript to manipulate the window through accessibility actions (AXRaise, AXMain)
  3. Proper window focusing and bringing to front

Updated
Words293
TagsRead1 minute

To add Mermaid support to Hugo, one key detail not explicitly covered in the official documentation is that code fences must be used to enable rendering hooks for code blocks. If code fences are not enabled, the rendering hook has no effect.

Additionally, the condition for including Mermaid.js does not function as expected. Despite trying multiple approaches, I was unable to make it work reliably and ultimately decided to remove the conditional statement entirely.

complete configuration

hugo.toml

[markup]
[markup.highlight]
codeFences = true # enable render code block hook
noClasses = false # tells Hugo to use CSS classes instead of inline styles

layouts/_default/_markup/render-codeblock-mermaid.html

<pre class="mermaid">
  {{- .Inner | safeHTML }}
</pre>
<script type="module">
  import mermaid from '[](https://cdn.jsdelivr.net/npm/mermaid/dist/mermaid.esm.min.mjs';)
  mermaid.initialize({ startOnLoad: true }); 
</script>

Updated
Words1278
TagsRead4 minutes

Notes on OAuth 2.0

OAuth 2.0 is an open authorization framework that allows applications to access resources on behalf of a user, without exposing the user’s credentials. It enables secure access by providing access tokens for API requests, facilitating delegated permissions across services.

Security:

  • Tokens never exposed to browser
  • Session cookie only contains session ID
  • Session cookie typically expires in 24 hours or on browser close
  • Tokens stored securely on server

User Experience:

  • Seamless token refresh
  • No need to re-login when token expires
  • Single sign-on benefits

Phases

sequenceDiagram
    actor U as User
    participant C as Client App
    participant AS as Auth Server
    participant RS as Resource Server

    note over U,RS: Phase 1: Authorization Code Request
    U->>C: 1. Clicks "Login with Service"
    C->>AS: 2. Authorization Request
    Note right of C: GET /authorize?
client_id=123&
redirect_uri=https://app/callback&
response_type=code&
scope=read_profile&
state=xyz789 AS->>U: 3. Shows Login & Consent Page U->>AS: 4. Logs in & Approves Access AS->>C: 5. Redirects with Auth Code Note right of AS: GET /callback?
code=AUTH_CODE_123&
state=xyz789 note over U,RS: Phase 2: Token Exchange C->>AS: 6. Token Request Note right of C: POST /token
client_id=123&
client_secret=SECRET&
grant_type=authorization_code&
code=AUTH_CODE_123&
redirect_uri=https://app/callback AS->>AS: 7. Validates Auth Code AS->>C: 8. Returns Tokens Note right of AS: {
"access_token": "ACCESS_789",
"refresh_token": "REFRESH_111",
"expires_in": 3600
} C->>U: 9. Set session cookie Note right of C: Cookie contains
session ID only note over U,RS: Phase 3: Resource Access C->>RS: 10. API Request with Access Token Note right of C: GET /api/resource
Authorization: Bearer ACCESS_789 RS->>RS: 11. Validates Token RS->>C: 12. Returns Protected Resource note over U,RS: Phase 4: Token Refresh (When Access Token Expires) C->>AS: 13. Refresh Token Request Note right of C: POST /token
grant_type=refresh_token&
refresh_token=REFRESH_111&
client_id=123&
client_secret=SECRET AS->>C: 14. Returns New Access Token Note right of AS: {
"access_token": "NEW_ACCESS_999",
"expires_in": 3600
}

Token Refreshing Process

sequenceDiagram
    participant B as Browser
    participant C as Client (Your Server)
    participant AS as Auth Server (Google)

    Note over B,AS: Initial Login (happens once)
    B->>C: 1. User visits site
    C->>AS: 2. Redirect to Google login
    AS->>B: 3. Login page
    B->>AS: 4. User logs in
    AS->>C: 5. Auth code
    C->>AS: 6. Exchange for tokens
    Note right of C: Receives:
- Access Token (30 min)
- Refresh Token (long-lived)
- ID Token (user info) C->>B: 7. Set session cookie Note right of C: Cookie contains
session ID only Note over B,AS: Later: Access Token Expired B->>C: 8. Makes request C->>C: 9. Checks token
found expired C->>AS: 10. Uses refresh token
to get new access token AS->>C: 11. New access token C->>B: 12. Serves request Note right of B: User never sees
this process

Token Validation Process

sequenceDiagram
    participant B as Browser
    participant S as Server
    participant R as Redis/DB
    participant AS as Auth Server (Google)

    Note over B,AS: Scenario 1: Valid Token
    B->>S: 1. Request with session_id cookie
    S->>R: 2. Lookup user_id by session_id
    R->>S: 3. Returns user_id
    S->>R: 4. Get tokens for user_id
    R->>S: 5. Returns tokens
    S->>S: 6. Check token expiration
    Note right of S: Token still valid
    S->>B: 7. Return requested resource

    Note over B,AS: Scenario 2: Expired Token
    B->>S: 1. Request with session_id cookie
    S->>R: 2. Lookup user_id by session_id
    R->>S: 3. Returns user_id
    S->>R: 4. Get tokens for user_id
    R->>S: 5. Returns tokens
    S->>S: 6. Check token expiration
    Note right of S: Token expired!
    S->>AS: 7. Refresh token request
    AS->>S: 8. New access token
    S->>R: 9. Store new tokens
    S->>B: 10. Return requested resource

    Note over B,AS: Scenario 3: Invalid Session
    B->>S: 1. Request with invalid session_id
    S->>R: 2. Lookup user_id by session_id
    R->>S: 3. Returns null
    S->>B: 4. Return 401 Unauthorized

token flow

sequenceDiagram
    actor U as User
    participant B as Browser
    participant C as Client App (Your Server)
    participant R as Redis/DB
    participant AS as Auth Server (Google)
    participant RS as Resource Server

    note over U,RS: Phase 1: Initial OAuth Login
    U->>B: 1. Clicks "Login with Google"
    B->>C: 2. Request login
    C->>AS: 3. Authorization Request
    Note right of C: GET /authorize?
client_id=123&
redirect_uri=https://app/callback&
response_type=code&
scope=read_profile&
state=xyz789 AS->>B: 4. Shows Login & Consent Page U->>AS: 5. Logs in & Approves Access AS->>C: 6. Redirects with Auth Code Note right of AS: GET /callback?
code=AUTH_CODE_123&
state=xyz789 note over U,RS: Phase 2: Token Exchange & Session Setup C->>AS: 7. Exchange auth code for tokens Note right of C: POST /token
client_id=123&
client_secret=SECRET&
grant_type=authorization_code&
code=AUTH_CODE_123 AS->>C: 8. Returns Tokens Note right of AS: {
"access_token": "ACCESS_789",
"refresh_token": "REFRESH_111",
"expires_in": 3600
} C->>R: 9. Store tokens Note right of C: Store in Redis/DB:
user_id: user123
access_token: ACCESS_789
refresh_token: REFRESH_111 C->>B: 10. Set session cookie Note right of B: Cookie contains only:
session_id: "sess_xyz" note over U,RS: Phase 3: Subsequent API Requests B->>C: 11. Request with session cookie Note right of B: Cookie: session_id=sess_xyz C->>R: 12. Look up user_id & tokens R->>C: 13. Return tokens C->>RS: 14. API request with access token Note right of C: Authorization: Bearer ACCESS_789 RS->>C: 15. Return resource C->>B: 16. Send response to browser note over U,RS: Phase 4: Token Refresh (Automatic) B->>C: 17. Later request with session cookie C->>R: 18. Look up tokens R->>C: 19. Return tokens (expired) C->>AS: 20. Refresh token request Note right of C: POST /token
grant_type=refresh_token
refresh_token=REFRESH_111 AS->>C: 21. New access token C->>R: 22. Update stored tokens Note right of C: Update Redis/DB with
new access_token C->>RS: 23. Retry API request RS->>C: 24. Return resource C->>B: 25. Send response Note right of B: Same session cookie
continues to work

Updated
Words991
TagsRead3 minutes

Programming concepts in SQL

CTEs (Common Table Expressions) as Variables/Functions

  • Just like how you declare variables or functions in traditional programming, CTEs let you create named result sets that you can reference later
  • Each CTE can be thought of as storing an intermediate result, similar to variable assignment
  • They can be chained together like function calls, where one CTE uses results from another

GROUP BY as Loops

  • In traditional programming, you might loop through data to accumulate results
  • GROUP BY automatically “loops” through rows, grouping related records together
  • The aggregation functions (SUM, COUNT, AVG) work like accumulators inside these implicit loops

Example in programming vs SQL:

# Python equivalent
results = {}
for row in data:
    if row.customer_id not in results:
        results[row.customer_id] = {
            'total_transactions': 0,
            'total_spend': 0
        }
    results[row.customer_id]['total_transactions'] += 1
    results[row.customer_id]['total_spend'] += row.quantity * row.unit_price
-- SQL equivalent using GROUP BY
SELECT 
    customer_id,
    COUNT(*) as total_transactions,
    SUM(quantity * unit_price) as total_spend
FROM sales
GROUP BY customer_id

CASE Statements as If/Switch

  • Similar to if/else or switch statements in programming
  • Can be used in both SELECT and WHERE clauses
  • Can be nested for complex conditional logic

Example comparison:

# Python equivalent
def get_spending_tier(total_spend):
    if total_spend >= 10000:
        return 'Premium'
    elif total_spend >= 5000:
        return 'Gold'
    elif total_spend >= 1000:
        return 'Silver'
    else:
        return 'Bronze'
-- SQL equivalent using CASE
CASE 
    WHEN total_spend >= 10000 THEN 'Premium'
    WHEN total_spend >= 5000 THEN 'Gold'
    WHEN total_spend >= 1000 THEN 'Silver'
    ELSE 'Bronze'
END as spending_tier

Advanced Programming-like Features

  1. Window Functions as Iterators:

SELECT 
    *,
    LAG(value) OVER (ORDER BY date) as previous_value,
    LEAD(value) OVER (ORDER BY date) as next_value
FROM data

This is similar to accessing previous/next elements in an array iteration.

  1. Recursive CTEs as While Loops
WITH RECURSIVE countdown(val) AS (
    SELECT 10  -- Initial value
    UNION ALL
    SELECT val - 1 FROM countdown WHERE val > 1  -- Loop condition and iteration
)
SELECT * FROM countdown;
  1. HAVING as Filter After Processing: Like applying conditions after a loop completes:
SELECT category, COUNT(*) as count
FROM items
GROUP BY category
HAVING COUNT(*) > 10

Key Insights:

Declarative vs Imperative:

Traditional programming is imperative (you specify HOW to do something) SQL is declarative (you specify WHAT you want) The SQL engine optimizes the execution plan

Set-based Operations:

Instead of thinking about individual records, think in sets Operations like GROUP BY process entire sets of data at once This often leads to better performance than record-by-record processing

Composition:

CTEs allow you to break complex logic into manageable pieces Each CTE can build upon previous ones, similar to function composition This promotes code reusability and maintainability

Updated
Words5574
TagsRead10 minutes

Recently I refreshed my SQL knowledge, and finished Learn Advanced SQL Tutorials | Kaggle . Highly recommend the Analytic Functions Exercise Section.

I want to share some of my notes and the materials I used. This is going to be a long post.

There are some topics that developers should be familiar with, even if they don’t work on database directly. There topics are:

  • JOIN & UNION
  • Common Table Expression (CTE)
  • Analytic Functions (or Window Functions)

JOIN

JOIN is equivalent to INNER JOIN

there are:

  • INNER JOIN
  • LEFT JOIN
  • RIGHT JOIN

Best practices

Performance:

  • Consider indexing join columns
  • Use WHERE clauses to filter results when possible

UNION

UNION: to stack data on top of each other

UNION is equivalent to UNION DISTINCT there are

  • UNION DISTINCT
  • UNION ALL

Syntax

SELECT column1 FROM table1
UNION [DISTINCT | ALL]
SELECT column1 FROM table2

UNION (or UNION DISTINCT)

  • Combines results from two or more SELECT statements
  • Automatically removes duplicate rows
  • Returns only unique rows
  • Requires compatible columns (same number and compatible data types)
  • More computationally expensive due to duplicate removal

Example: query unique user id that create at least one question or one answer

SELECT owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE DATE(creation_date) = '2019-01-01'
UNION DISTINCT
SELECT owner_user_id
FROM `bigquery-public-data.stackoverflow.posts_answers`
WHERE DATE(creation_date) = '2019-01-01'

Core Concept: Position-Based Matching

UNION DISTINCT (or UNION) compares rows based on ALL columns in the corresponding positions, regardless of their names. What matters is:

  • Column position (order)
  • Data types (must be compatible)
  • Values in all columns

Common Pitfalls

data type mismatches


-- Might fail due to type mismatch
SELECT employee_id FROM employees  -- integer
UNION
SELECT employee_code FROM contractors;  -- varchar

column order importance

-- These produce different results
SELECT firstname, lastname FROM employees
UNION
SELECT lastname, firstname FROM contractors;

Understanding Common Table Expression (CTE)

What are CTEs?

Common Table Expressions (CTEs) are temporary named result sets that exist only within the scope of a single SQL statement. They’re defined using the WITH clause and can be thought of as “named subqueries” that make complex queries more readable and maintainable.

History

Common Table Expressions (CTEs) are part of the SQL standard

  • CTEs were first introduced in SQL:1999 (also known as SQL3)
  • Recursive CTEs were also included in this standard
  • The syntax using the WITH clause became part of the ANSI/ISO SQL standard

Basic Syntax

WITH cte_name AS (
    -- CTE query definition
    SELECT column1, column2
    FROM table_name
    WHERE condition
)
-- Main query that uses the CTE
SELECT *
FROM cte_name;
  • Break complex queries into manageable, named components. Easy to maintain and debug
  • Can reference the same CTE multiple times in a query. Avoid repeating complex subqueries
  • Recursive calling, can reference themselves to handle hierarchical or graph-like data

Example

Multiple CTEs

WITH 
monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', sale_date) as month,
        SUM(amount) as revenue
    FROM sales
    GROUP BY DATE_TRUNC('month', sale_date)
),
sales_growth AS (
    SELECT 
        month,
        revenue,
        LAG(revenue) OVER (ORDER BY month) as prev_month_revenue
    FROM monthly_sales
)
SELECT 
    month,
    revenue,
    prev_month_revenue,
    ((revenue - prev_month_revenue) / prev_month_revenue * 100) as growth_percentage
FROM sales_growth;

Recursive CTE example (Employee Hierachy)

WITH RECURSIVE emp_hierarchy AS (
    -- Base case: top-level employees (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        1 as level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive case: employees with managers
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        h.level + 1
    FROM employees e
    INNER JOIN emp_hierarchy h ON e.manager_id = h.employee_id
)
SELECT 
    employee_id,
    name,
    level,
    REPEAT('  ', level - 1) || name as org_chart
FROM emp_hierarchy
ORDER BY level, name;

Advanced Concepts

a) Materialization:

  • CTEs are typically materialized (computed once and stored in memory)
  • Can improve performance when referenced multiple times
  • Some databases optimize by not materializing if referenced only once

b) Scope:

  • CTEs are only available within the statement where they’re defined
  • Can’t be referenced across different statements
  • Can reference earlier CTEs in the same WITH clause

c) Best Practices:

  • Use meaningful names that describe the data or transformation
  • Break complex logic into smaller, focused CTEs
  • Consider performance implications for large datasets
  • Use comments to explain complex logic

Comparison with other SQL Features

CTEs vs Subqueries:

  • CTEs are more readable
  • CTEs can be reused in the same query
  • CTEs support recursion
  • CTEs can sometimes perform better due to materialization

CTEs vs Views:

  • CTEs are temporary and query-scoped
  • Views are permanent database objects
  • CTEs can be recursive
  • Views can be indexed

Analytic Functions (or Window Functions)

Analytic functions are sometimes referred to as analytic window functions or simply window functions

Basic Concept

Window functions perform calculations across a set of table rows that are somehow related to the current row. The key difference from regular aggregate functions is that window functions don’t collapse rows - each row keeps its separate identity while gaining additional computed values.

Think of adding a column that calculate base on a set of rows without collasping rows as GROUP BY does.

CREATE TABLE employee_sales (
    emp_id INT,
    emp_name VARCHAR(50),
    department VARCHAR(50),
    sale_amount DECIMAL(10,2),
    sale_date DATE
);

INSERT INTO employee_sales VALUES
(1, 'John', 'Electronics', 1000, '2024-01-01'),
(1, 'John', 'Electronics', 1200, '2024-01-02'),
(2, 'Sarah', 'Electronics', 800, '2024-01-01'),
(2, 'Sarah', 'Electronics', 1500, '2024-01-02'),
(3, 'Mike', 'Furniture', 2000, '2024-01-01'),
(3, 'Mike', 'Furniture', 2500, '2024-01-02');

Basic Window Function Syntax

The basic syntax is:

function_name() OVER (
    [PARTITION BY column1, column2, ...] -- create group
    [ORDER BY column3, column4, ...]     -- order inside the group
    [ROWS/RANGE BETWEEN ... AND ...]     -- widnow frame clause
)

Common Window Functions

Aggregate functions

take all of the values within the window as input and return a single value

  • MIN()
  • MAX()
  • AVG()
  • SUM()
  • COUNT()

Assign a value based on the value in a (usually) different row than the current row.

  • FIRST_VALUE() (or LAST_VALUE()) - Returns the first (or last) value in the input
  • LEAD() (and LAG()) - Returns the value on a subsequent (or preceding) row

Numbering functions

  • ROW_NUMBER() - Returns the order in which rows appear in the input (starting with 1)
  • RANK() - All rows with the same value in the ordering column receive the same rank value, where the next row receives a rank value which increments by the number of rows with the previous rank value.

Core Concept: Window Frames

Window frames define which rows to include in the window calculation:

SELECT 
    emp_name,
    sale_date,
    sale_amount,
    AVG(sale_amount) OVER (
        PARTITION BY emp_name 
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) as moving_avg
FROM employee_sales;

window frame clause example

-- window frame cluse
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
ROWS BETWEEN 1 FOLLWING AND CURRENT ROW
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

-- no need to say AND CURRENT ROW, totaly 7 rows, current row is included
ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING

Core Concept: Understand the OVER Clause - The Window Creator

Think of OVER as opening a “window” through which you look at your data. It’s like saying “calculate this function OVER this specific set of rows.” Simple example:

-- Without OVER (regular SUM)
SELECT department, SUM(salary) 
FROM employees 
GROUP BY department;

-- With OVER (window function)
SELECT 
    employee_name,
    department,
    salary,
    SUM(salary) OVER() as total_salary
FROM employees;

Key difference:

  • Without OVER: You get one row per department
  • With OVER: You keep ALL rows, but each row shows the total salary

Let’s use a sample table to make this clearer:

CREATE TABLE employees (
    name VARCHAR(50),
    department VARCHAR(50),
    salary INT
);

INSERT INTO employees VALUES
('John', 'IT', 60000),
('Mary', 'IT', 70000),
('Steve', 'HR', 55000),
('Jane', 'HR', 65000);

Example 1: Simple OVER

SELECT 
    name,
    department,
    salary,
    SUM(salary) OVER() as company_total
FROM employees;

Result:

name    department    salary    company_total
John    IT           60000     250000
Mary    IT           70000     250000
Steve   HR           55000     250000
Jane    HR           65000     250000

Every row shows the company total (250000) while keeping individual salary information.

Core Concept: PARTITION BY - Creating Groups

PARTITION BY divides your data into groups (partitions) for the window function to operate on separately. It’s similar to GROUP BY, but it doesn’t collapse the rows. Think of it like this:

  • GROUP BY: Combines rows into one summary row
  • PARTITION BY: Keeps all rows but performs calculations within each group

Example 2: Using PARTITION BY

SELECT 
    name,
    department,
    salary,
    SUM(salary) OVER(PARTITION BY department) as dept_total
FROM employees;

Result:

name    department    salary    dept_total
John    IT           60000     130000    -- Sum of IT salaries
Mary    IT           70000     130000    -- Sum of IT salaries
Steve   HR           55000     120000    -- Sum of HR salaries
Jane    HR           65000     120000    -- Sum of HR salaries

Let’s break down what happened:

PARTITION BY department split the data into two groups: IT and HR SUM(salary) was calculated separately for each group Each employee still has their own row, but now shows their department’s total

Visual Explanation

Let’s imagine a more visual example with sales data:

CREATE TABLE sales (
    salesperson VARCHAR(50),
    region VARCHAR(50),
    sale_amount INT,
    sale_date DATE
);

INSERT INTO sales VALUES
('Tom', 'North', 100, '2024-01-01'),
('Tom', 'North', 150, '2024-01-02'),
('Lisa', 'South', 200, '2024-01-01'),
('Lisa', 'South', 250, '2024-01-02');

Three different ways to use window functions:

  1. No Partitioning:
SELECT 
    salesperson,
    sale_amount,
    SUM(sale_amount) OVER() as total_sales
FROM sales;

Result:

salesperson    sale_amount    total_sales
Tom           100            700  -- Company total
Tom           150            700  -- Company total
Lisa          200            700  -- Company total
Lisa          250            700  -- Company total
  1. With PARTITION BY:
SELECT 
    salesperson,
    sale_amount,
    SUM(sale_amount) OVER(PARTITION BY salesperson) as person_total
FROM sales;

Result:

salesperson    sale_amount    person_total
Tom           100            250  -- Tom's total
Tom           150            250  -- Tom's total
Lisa          200            450  -- Lisa's total
Lisa          250            450  -- Lisa's total
  1. With PARTITION BY and ORDER BY:
SELECT 
    salesperson,
    sale_date,
    sale_amount,
    SUM(sale_amount) OVER(
        PARTITION BY salesperson 
        ORDER BY sale_date
    ) as running_total
FROM sales;

Results:

salesperson    sale_date    sale_amount    running_total
Tom           2024-01-01   100            100  -- First day
Tom           2024-01-02   150            250  -- First + Second day
Lisa          2024-01-01   200            200  -- First day
Lisa          2024-01-02   250            450  -- First + Second day

ORDER BY in OVER Clause is often used to creating a running total report

  1. Key Points to Remember:
  • OVER() without PARTITION BY: Calculates across ALL rows
  • OVER(PARTITION BY x): Calculates separately for each group of x
  • You always keep all your original rows
  • Each row can show both its individual value AND the calculation result

Practical Use Cases

Calculating Moving Averages

SELECT 
    sale_date,
    sale_amount,
    AVG(sale_amount) OVER (
        ORDER BY sale_date 
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3_days
FROM employee_sales;

Calculating Percentages of Total

SELECT 
    emp_name,
    department,
    sale_amount,
    sale_amount / SUM(sale_amount) OVER (PARTITION BY department) * 100 as pct_of_dept_sales
FROM employee_sales;

Running Total of Sales

CREATE TABLE monthly_sales (
    month VARCHAR(10),
    sales INT
);

INSERT INTO monthly_sales VALUES
('January', 1000),
('February', 1500),
('March', 1200),
('April', 1800);

SELECT 
    month,
    sales,
    SUM(sales) OVER(ORDER BY sales) as running_total_by_amount,
    SUM(sales) OVER(ORDER BY month) as running_total_by_month
FROM monthly_sales;

Results:

month     sales    running_total_by_amount    running_total_by_month
January   1000     1000                       1000
March     1200     2200                       2500  -- Jan + Feb
February  1500     3700                       4200  -- Jan + Feb + Mar
April     1800     5500                       5500  -- All months

Compare with Previous Sale

SELECT 
    month,
    sales,
    LAG(sales) OVER(ORDER BY month) as previous_month,
    sales - LAG(sales) OVER(ORDER BY month) as difference
FROM monthly_sales;

Result:

month     sales    previous_month    difference
January   1000     NULL              NULL
February  1500     1000              500
March     1200     1500              -300
April     1800     1200              600

Calculate with Previous row

prev_break column shows the length of the break (in minutes) that the driver had before each trip started (this corresponds to the time between trip_start_timestamp of the current trip and trip_end_timestamp of the previous trip).

SELECT taxi_id,
    trip_start_timestamp,
    trip_end_timestamp,
    TIMESTAMP_DIFF(
        trip_start_timestamp, 
        LAG(trip_end_timestamp) 
            OVER (
                PARTITION BY taxi_id 
                ORDER BY trip_start_timestamp), 
        MINUTE) as prev_break
FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`
WHERE DATE(trip_start_timestamp) = '2013-10-03'

Best Practices

  • Always specify ORDER BY in window functions when using aggregate functions
  • Use appropriate indexes for columns in PARTITION BY and ORDER BY
  • Be careful with frame specifications as they can impact performance
  • Test with small datasets first

Why window functions were invented

Historical Context:

Before window functions (prior to SQL:2003), developers faced several challenges:

The GROUP BY Dilemma

  • You either got detail rows OR aggregated results, never both
  • Had to use complex subqueries or self-joins for analytics
  • Example problem: “Show each sale AND its percentage of total sales”

Complex Analytics Requirements

  • Business users needed running totals, rankings, moving averages
  • Solutions were inefficient and hard to maintain
  • Required multiple passes over the data

Date function

-- on that day
WHERE DATE(creation_date) = '2019-01-01'
-- between days
WHERE creation_date >= '2019-01-01' AND creation_date < '2019-02-01'
--- on that year
WHERE EXTRACT(YEAR from creation_date) = '2019' 

Updated
Words429
TagsRead3 minutes

Build Your Own Workflow: Automating Tasks for Efficiency

Emacs is all about tailoring the text editor to fit your own needs, and Lisp program language is all about building the right abstraction for the problem. After years of using Emacs/writing Emacs lisp, one lesson stands out: whenever you find yourself repeating a task, it’s time to build your own tools and create a personalized workflow. A good automated workflow isn’t just about saving time—it’s about using automation to offload tedious, repetitive details, freeing up mental energy for the things that truly matter in the task. With the rise of Large Language Models (LLMs), even tasks that previously required fuzzy logic can now be automated.

On mobile, iOS Shortcuts offers a powerful way to design custom workflows. Pair it with tools like a-Shell, Scriptable, and Data Jar, and you can create powerful automation that integrates seamlessly with your daily tasks.

Example 1: Automating Stroke Order Lookup

Sometimes, I forget the stroke order of a Chinese character and need to look it up. Instead of using third-party apps or look it up on the website every time, I built a Scriptable script that handles this for me automatically. It visit zdict.net, download and display the GIF. Now, with one tap, I can quickly find the stroke order without breaking my flow.

Example 2: Automating note taking of Language Learning

When learning German, I often need to record new vocabulary. To streamline this, I crafted a custom prompt for ChatGPT to help with translations and explanations, and then log the results into [[Obsidian]] for future study. I linked these steps into a single iOS Shortcut, making the process completely automated—from asking the question to saving the notes.

Example 3: Job Search Automation

Job hunting can be an exhausting process, involving multiple platforms, browsing job descriptions (JDs), evaluating opportunities, and tracking applications. After repeating these steps too many times, I developed a Python application combined with Selenium for browser automation. This app scrapes job postings, pulls the JDs, and stores the data in a Django backend. I also integrated ChatGPT to compare my resume with the job descriptions, giving me a quick assessment of which positions are a good match. This allows me to filter out irrelevant jobs and focus on those with the highest potential, leaving only the final judgment to manual review.

build the automation

By building your own workflow, you can minimize time wasted on tedious details and concentrate on what truly matters. Whether it’s handling simple everyday tasks or tackling more complex challenges like job hunting, there’s always room to streamline your process.