HomeAboutPostsTagsProjectsRSS

Python

Updated
Words1299
TagsRead4 minutes

Introducing Transducers: A Powerful Tool for Functional Programming

I recently learned the concept of transducer and implement it in [[Gleam]] language.

GitHub - nohzafk/gtransducer: Transducer in Gleam language

Transducers originated in Clojure, designed to tackle specific challenges in functional programming and data processing. If you’re working with large datasets, streaming data, or complex transformations, understanding transducers can significantly enhance the efficiency and composability of your code.

What Are Transducers?

At their core, transducers are composable functions that transform data. Unlike traditional functional programming techniques like map, filter, and reduce, which are tied to specific data structures, transducers abstract the transformation logic from the input and output, making them highly reusable and flexible.

Key Advantages of Transducers

1. Composability and Reusability

Transducers allow you to compose and reuse transformation logic across different contexts. By decoupling transformations from data structures, you can apply the same logic to lists, streams, channels, or any other sequential data structure. This makes your code more modular and adaptable.

2. Performance Optimization

One of the primary motivations for using transducers is to optimize data processing. Traditional approaches often involve creating intermediate collections, which can be costly in terms of performance, especially with large datasets. Transducers eliminate this overhead by performing all operations in a single pass, without generating intermediate results.

A Python example

import time
from functools import reduce

# Traditional approach
def traditional_approach(data):
    return [x * 2 for x in data if (x * 2) % 2 == 0]

# Transducer approach
def mapping(f):
    def transducer(reducer):
        def wrapped_reducer(acc, x):
            return reducer(acc, f(x))
        return wrapped_reducer
    return transducer

def filtering(pred):
    def transducer(reducer):
        def wrapped_reducer(acc, x):
            if pred(x):
                return reducer(acc, x)
            return acc
        return wrapped_reducer
    return transducer

def compose(t1, t2):
    def composed(reducer):
        return t1(t2(reducer))
    return composed

def transduce(data, initial, transducer, reducer):
    transformed_reducer = transducer(reducer)
    return reduce(transformed_reducer, data, initial)

data = range(1000000)

# Measure traditional approach
start = time.time()
traditional_result = traditional_approach(data)
traditional_time = time.time() - start

# Measure transducer approach
xform = compose(
    mapping(lambda x: x * 2),
    filtering(lambda x: x % 2 == 0)
)

def efficient_reducer(acc, x):
    acc.append(x)
    return acc

start = time.time()
transducer_result = transduce(data, [], xform, efficient_reducer)
transducer_time = time.time() - start

# Results
print(f"Traditional approach time: {traditional_time:.4f} seconds")
print(f"Transducer approach time: {transducer_time:.4f} seconds")
print(f"Traditional is faster by: {transducer_time / traditional_time:.2f}x")

however when executed the transducer version is much slower in Python

Traditional approach time: 0.0654 seconds
Transducer approach time: 0.1822 seconds
Traditional is faster by: 2.78x

Are Transducers Suitable for Python?

While transducers offer theoretical benefits in terms of composability and efficiency, Python might not be the best language for leveraging these advantages. Here’s why:

  1. Python’s Function Call Overhead: Python has a relatively high overhead for function calls. Since transducers rely heavily on higher-order functions, this overhead can negate the performance gains that transducers are designed to offer.

  2. Optimized Built-in Functions: Python’s built-in functions like map, filter, and list comprehensions are highly optimized in C. These built-ins often outperform custom transducer implementations, especially for common tasks.

  3. Efficient Mutation with Lists: Python’s lists are mutable, and appending to a list in a loop is highly efficient. The traditional method of using list comprehensions or filter and map is often faster and more straightforward than setting up a transducer pipeline.

When to Use Transducers

Transducers shine in functional programming languages that emphasize immutability and composability, such as Clojure or Gleam. In these languages, transducers can significantly reduce the overhead of creating intermediate collections and improve performance in complex data pipelines. They’re especially powerful when working with immutable data structures, where avoiding unnecessary copies is crucial for efficiency.

In contrast, Python’s strength lies in its mutable data structures and optimized built-in functions, which often make traditional approaches more performant. However, if you’re working in a functional programming environment where immutability is the norm, or if you need to maintain a consistent API across various data sources, transducers can be a valuable tool.

Conclusion

Transducers are a powerful tool in the right context, but Python’s inherent characteristics—such as function call overhead and optimized built-ins—mean that traditional approaches may be more efficient for typical data processing tasks. If you’re working in a language that deeply benefits from transducers, like Gleam, they can greatly enhance your code. In Python, however, it’s often best to use the language’s strengths, such as list comprehensions and optimized built-ins, for performance-critical applications.

Updated
Words982
TagsRead4 minutes

Python is a great language but not perfect.

There are some common pitfalls, many of these are legacy issues retained for backward compatibility.

I want to share some of them.

Global Interpreter Lock (GIL)

It’s 2024, but Python still struggles with multi-core utilization due to the Global Interpreter Lock (GIL).

  • The GIL prevents multiple native threads from executing Python bytecode simultaneously.
  • This significantly limits the effectiveness of multi-threading for CPU-bound tasks in CPython.
  • While technically a CPython implementation detail, Python’s lack of a formal language specification means CPython’s behavior is often duplicated in other implementations.

Historically, when Python was created, there were no multi-core CPUs. When multi-core CPUs emerged, OS started to add thread support, the author added a thread interface as well, but the implementation was essentially single-core. The intention was to add real multi-threaded implementation later, but 30 years on, Python still grapples with this issue.

The GIL’s persistence is largely due to backward compatibility concerns and the fundamental changes removing it would require in the language and its ecosystem.

Lack of Block Scope

Unlike many languages, Python doesn’t have true block scope. It uses function scope and module scope instead.

def example_function():
    if True:
        x = 10  # This variable is not block-scoped
    print(x)  # This works in Python, x is still accessible

example_function()  # Outputs: 10

Implications:

  1. Loop Variable Leakage:

    for i in range(5):
        pass
    print(i)  # This prints 4, the last value of i
    
  2. Unexpected Variable Overwrites:

    x = 10
    if True:
        x = 20  # This overwrites the outer x, not create a new one
    print(x)  # Prints 20
    
  3. Difficulty in Creating Temporary Variables: It’s harder to create variables that are guaranteed to be cleaned up after a block ends.

  4. List Comprehension Exception: Interestingly, list comprehensions do create their own scope in Python 3.x.

    [x for x in range(5)]
    print(x)  # This raises a NameError in Python 3.x
    

Best practices:

  • Use functions to simulate block scope when needed.
  • Be mindful of variable names to avoid accidental overwrites.
  • Be cautious of the risk of using incorrect variable names in large functions.

Mutable Objects as Default Arguments

This is a particularly tricky pitfall:

def surprise(my_list = []):
    print(my_list)
    my_list.append('x')

surprise()  # Output: []
surprise()  # Output: ['x']

Why this happens:

  • Default arguments are evaluated when the function is defined, not when it’s called.
  • The same list object is used for all calls to the function.

This behavior:

  • Dates back to Python’s early days, possibly for performance reasons or implementation simplicity.
  • Goes against the “Principle of Least Astonishment”.
  • Has very few practical use cases and often leads to bugs.

Best practice: Use None as a default for mutable arguments and initialize inside the function:

def better_surprise(my_list=None):
    if my_list is None:
        my_list = []
    print(my_list)
    my_list.append('x')

Late Binding Closures

This issue is particularly tricky in loops:

def create_multipliers():
    return [lambda x: i * x for i in range(4)]

multipliers = create_multipliers()
print([m(2) for m in multipliers])  # Outputs: [6, 6, 6, 6]

Explanation:

  • The lambda functions capture the variable i itself, not its value at creation time.
  • By the time these lambda functions are called, the loop has completed, and i has the final value of 3.

Fix: Use a default argument to capture the current value of i:

def create_multipliers():
    return [lambda x, i=i: i * x for i in range(4)]

This behavior is particularly confusing because it goes against the intuitive understanding of how closures should work in many other languages.

The __init__.py Requirement

In Python 2 and early versions of Python 3, a directory had to contain an __init__.py file to be treated as a package.

  • This requirement often confused beginners and led to subtle bugs when forgotten.
  • It provided a clear, explicit way to define package boundaries and behavior.

Evolution:

  • Python 3.3 introduced PEP 420, allowing for implicit namespace packages.
  • Directories without __init__.py can now be treated as packages under certain conditions.

Modern best practices:

  1. Use __init__.py when you need initialization code or to control package exports.
  2. For simple packages or namespace packages, you can often omit __init__.py in Python 3.

Understanding these pitfalls is crucial for writing efficient, bug-free Python code. While they can be frustrating, they’re part of Python’s evolution and often retained for backward compatibility. Being aware of them will help you navigate Python development more effectively.

Updated
Words2196
TagsRead4 minutes

display

System Settings > Accessibility > Display > Turn On Reduce Motion

modifier key

  • change to ^
  • caps lock change to

finder

# show hidden files
defaults write com.apple.finder AppleShowAllFiles YES

# show path bar
defaults write com.apple.finder ShowPathbar -bool true

killall Finder;

brew

install brew

Quick Look plugins

brew install --cask \
    qlcolorcode \
    qlstephen \
    qlmarkdown \
    quicklook-json \
    qlprettypatch \
    quicklook-csv \
    betterzip \
    webpquicklook \
    suspicious-package

applications

brew install --cask \
    appcleaner \
    google-chrome \
    iterm2 \    
    raycast \
    visual-studio-code

command line tools

# [Nerd Font](https://www.nerdfonts.com/)
brew install --cask font-fira-code-nerd-font

brew install \
    bat \
    fish \
    git \
    git-delta \
    go \
    hugo \
    jq \
    neofetch \
    orbstack \
    ripgrep \
    starship \
    tree \
    wget

git

git config --global alias.ci commit
git config --global alias.co checkout
git config --global alias.ss status

git-delta

~/.gitconfig

[core]
    pager = delta

[interactive]
    diffFilter = delta --color-only

[delta]
    navigate = true    # use n and N to move between diff sections
    light = false      # set to true if you're in a terminal w/ a light background color (e.g. the default macOS terminal)

[merge]
    conflictstyle = diff3

[diff]
    colorMoved = default

fish shell

starship preset nerd-font-symbols -o ~/.config/starship.toml

add to ~/.config/fish/config.fish

# Add HomeBrew's bin directory to path so you can use HomeBrew's binaries like `starship`
# Fish uses `fish_add_path` instead of `export PATH` modify $PATH.
fish_add_path "/opt/homebrew/bin/"
# Enable Starship prompt
starship init fish | source

Package manager

fisher

plugin:

abbreviation

add to ~/.config/fish/config.fish

source ~/.config/fish/abbreviation.fish

create abbreviation.fish

abbr proxyall "set --export http_proxy http://127.0.0.1:7890; set --export https_proxy http://127.0.0.1:7890"

custom function

it’s very easy to add a custom function in fish shell, an example

Visual Studio Code

Settings

Side Bar:Location change to right

extensions

  • Auto Hide
  • AutoTrim
  • Emacs Friendly Keymap
  • Indenticator
  • Sort lines
  • vscode-icons

Python

use pyenv to manage Python environments, don’t reply on the python installed by brew, because it might update Python version upexpecetdly when performs brew update.

brew install readline xz pyenv
# otpinal: setup pyenv with fish shell
alias brew="env PATH=(string replace (pyenv root)/shims '' \"\$PATH\") brew"
exec "$SHELL"

pyenv install 3.11.6
pyenv global 3.11.6

Reference

Terminal emulator

iTerm2

Keyboard setting

Use fn to Change Input Source

Keyboard Shortcuts -> Uncheck all Input Sources Spotlight

Apple Internal Keyboard

Keyboard Shortcuts -> Modifier Keys

  • Caps Lock Key -> Command
  • Command -> Control

External Keyboard

Keyboard Shortcuts -> keep Modifier Keys unchanged

Karabiner-Elements

  • left_command -> left_control
  • left_control -> left_command
  • right_command -> right_control
  • right_option -> fn

Make Caps Lock → Hyper Key (⌃⌥⇧⌘) (Caps Lock if alone) import

Tiling Windows Manager

  • yabai (no need to disable System Integrity Protection)
  • skhd

yabai installation and configuration reference:

skhd config

cmd + ctrl + shift + alt is the Hyper Key configurated using Karabiner-Elements

# change focus between external displays (left and right)
cmd + ctrl + shift + alt - p : yabai -m display --focus west
cmd + ctrl + shift + alt - n : yabai -m display --focus east

# fullscreen window
cmd + ctrl + shift + alt - m : yabai -m window --toggle zoom-fullscreen

# cycle focus between windows
cmd + ctrl + shift + alt - o : yabai -m window --focus prev || yabai -m window --focus last
# cycle swap window to the main window
cmd + ctrl + shift + alt - j : /bin/bash ~/bin/cycle_clockwise.sh; yabai -m window --focus prev || yabai -m window --focus last

# rotate layout clockwise
cmd + ctrl + shift + alt - r : yabai -m space --rotate 270
# flip along y-axis
cmd + ctrl + shift + alt - y : yabai -m space --mirror y-axis
# flip along x-axis cmd + ctrl + shift + alt - x : yabai -m space --mirror x-axis
# toggle window float
cmd + ctrl + shift + alt - f : yabai -m window --toggle float --grid 4:4:1:1:2:2

cycle_clockwise.sh:

#!/bin/bash

win=$(yabai -m query --windows --window last | jq '.id')

while : ; do
    yabai -m window $win --swap prev &> /dev/null
    if [[ $? -eq 1 ]]; then
        break
    fi
done

Updated
Words1776
TagsRead3 minutes

I recently ventured into deploying a service on Google Cloud Run. My goal was straightforward: create a service that fetches webpage titles and captures screenshots of URLs. However, the journey led me into a peculiar bug when I actually used it on Goole Cloud Run.

The Bug

During the development phase, I worked with a python:3.11-slim base image on macOS, and my Dockerfile functioned without a hitch. Here’s a snapshot of the Dockerfile I used:

from python:3.11-slim

RUN apt-get update && \
    apt-get install -y git && \
    python -m pip install --upgrade pip && \
    pip install -r requirements.txt && \
    pip install pytest-playwright && \
    playwright install-deps && \
    playwright install && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

Yet, upon deploying to Google Cloud Run and initiating the screenshot capture process, I hit a snag:

playwright._impl._api_types.Error: Executable doesn't exist at /home/.cache/ms-playwright/chromium-1084/chrome-linux/chrome
╔═════════════════════════════
║ Looks like Playwright was just installed or updated.                   
║ Please run the following command to download new browsers: 
║     playwright install                                                                          
║ <3 Playwright Team                                                                         
╚═════════════════════════════

Official Playwright Docker Image Saves the Day

Rather than wrestle with the error, I pivoted to an official Docker image of Playwright, and skipped installation of dependency:

mcr.microsoft.com/playwright/python:v1.39.0-jammy

docker image

Let’s dig down the issue:

The Compatibility Issue

Playwright demands compatibility. It officially supports Python versions 3.8 and higher, and it requires specific Linux distributions:

  • Debian 11 (bullseye)
  • Debian 12 (bookworm)
  • Ubuntu 20.04 (focal)
  • Ubuntu 22.04 (jammy)

However, on docker environment, the official image is only based on Unbuntu .

Use ENV PLAYWRIGHT_BROWSERS_PATH

After some search and experiments, I found the only solution in to specify the chromium binary files using ENV PLAYWRIGHT_BROWSERS_PATH during install . source code Dockerfile also use this environment variable to specify the broswer executable path.

using python:3.11-slim-bookworm

FROM python:3.11-slim-bookworm

ENV PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright
# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

# Install git
RUN apt-get update

# install playwright or whatever
RUN python -m pip install --upgrade pip && \
    pip install -r requirements.txt

# install chrominum
RUN PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright && \
    playwright install --with-deps chromium

using ubuntu:22.04

FROM ubuntu:22.04

ENV PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright
# Turns off buffering for easier container logging
ENV PYTHONUNBUFFERED=1

# Install git
RUN apt-get update && \
    apt-get install -y python3-all python-is-python3 python3-pip

# install playwright or whatever
RUN python -m pip install --upgrade pip && \
    pip install -r requirements.txt

# install chrominum
RUN PLAYWRIGHT_BROWSERS_PATH=/app/ms-playwright && \
    playwright install --with-deps chromium

Memory requirement of Google Cloud Run

The playwright 1.39.0 requires slightly more than 512MB of memory to run on Google Cloud Run. Adjust the memory limit on GCR, as it’s 512 MB by default.

Conclusion

Use the official Docker image to save time, or specify the PLAYWRIGHT_BROWSERS_PATH environment variable on a supported linux docker image.

Further reading:

Updated
Words603
TagsRead1 minute

RoutineHub • Elevate Mobile Coding Together with Apple Shortcuts amazing resources for iOS shortcuts, one thing particularly useful is that the website can browse by App. Those shortcuts that use Scriptable on the App Store and a-Shell mini on the App Store Truly elevate automation capabilities on iOS to a new level. By combining these tools, you can accomplish complex tasks entirely on your iPhone.

a-shell notably supports Python, which is relevant given Python’s ongoing discussions about adding Tier 3 support for the iOS platform in version 3.13. PEP 730 .

The focus is on “embedded mode”, since there are no stdout on iOS and you can’t provide things like the Python REPL

What’s up Python? iOS support, ruff gets black, flask 3.0…

Updated
Words639
TagsRead2 minutes

Introduction

This code example demonstrates how to use Python’s Flask framework and the python-telegram-bot library (version 13.0) to build a webhook for a Telegram bot. We will utilize the Google Cloud Run container to handle incoming messages and invoke the bot without long polling.

from flask import Flask, request, jsonify
from telegram import Bot, Update
from telegram.ext import Dispatcher
import requests

app = Flask(__name__)
bot = Bot(token="YOUR_BOT_TOKEN_HERE")
dispatcher = Dispatcher(bot, None, workers=0, use_context=True)

Defining the Message Handler

To handle incoming messages, we define a function called photo_handler that will be used as a handler for photo messages.

def photo_handler(update: Update, context):
    user_id = update.message.from_user.id
    owner_id = 123456789  # Replace with your user_id

    if user_id == owner_id:
        file_id = update.message.photo[-1].file_id  # Get the highest-resolution photo
        file_info = bot.get_file(file_id)
        file_url = file_info.file_path
        
        # Download the file (assuming you have 'requests' installed)
        response = requests.get(file_url)
        with open("received_photo.jpg", "wb") as f:
            f.write(response.content)

Setting up the Webhook Endpoint

We define a route for the /webhook endpoint using the @app.route decorator. This endpoint will receive incoming updates from Telegram.

@app.route('/webhook', methods=['POST'])
def webhook():
    update = Update.de_json(request.get_json(), bot)
    dispatcher.process_update(update)
    return jsonify(success=True)

Adding the Message Handler to the Dispatcher

We add the photo_handler function as a message handler using the MessageHandler class from the telegram.ext module. We specify the Filters.photo filter to only handle photo messages.

if __name__ == "__main__":
    from telegram.ext import MessageHandler, Filters
    
    photo_handler = MessageHandler(Filters.photo, photo_handler)
    dispatcher.add_handler(photo_handler)
    app.run(port=5000)

Conclusion

By using Flask and the python-telegram-bot library, we can easily build a webhook for a Telegram bot. This allows us to handle incoming messages efficiently using a Google Cloud Run container instead of using long polling.