TIL Python use the integer itself as the hash value, except for -1. hash value for -1 is -2.
# For ordinary integers, the hash value is simply the integer itself (unless it's -1).classint:defhash_(self):value=selfifvalue==-1:value==-2returnvalue
Auto-venv
is a Fish shell script that automatically activates and deactivates Python virtual environments when entering/leaving directory that contains virtual environment.
Recently, I added multiple enhancements compare to the upstream version, now it handles edge cases more gracefully:
It safely manages virtual environment inheritance in new shell sessions.
It prevents shell exits during the activation and deactivation processes.
Transducers originated in Clojure, designed to tackle specific challenges in functional programming and data processing. If you’re working with large datasets, streaming data, or complex transformations, understanding transducers can significantly enhance the efficiency and composability of your code.
What Are Transducers?
At their core, transducers are composable functions that transform data. Unlike traditional functional programming techniques like map, filter, and reduce, which are tied to specific data structures, transducers abstract the transformation logic from the input and output, making them highly reusable and flexible.
Key Advantages of Transducers
1. Composability and Reusability
Transducers allow you to compose and reuse transformation logic across different contexts. By decoupling transformations from data structures, you can apply the same logic to lists, streams, channels, or any other sequential data structure. This makes your code more modular and adaptable.
2. Performance Optimization
One of the primary motivations for using transducers is to optimize data processing. Traditional approaches often involve creating intermediate collections, which can be costly in terms of performance, especially with large datasets. Transducers eliminate this overhead by performing all operations in a single pass, without generating intermediate results.
A Python example
importtimefromfunctoolsimportreduce# Traditional approachdeftraditional_approach(data):return[x*2forxindataif(x*2)%2==0]# Transducer approachdefmapping(f):deftransducer(reducer):defwrapped_reducer(acc,x):returnreducer(acc,f(x))returnwrapped_reducerreturntransducerdeffiltering(pred):deftransducer(reducer):defwrapped_reducer(acc,x):ifpred(x):returnreducer(acc,x)returnaccreturnwrapped_reducerreturntransducerdefcompose(t1,t2):defcomposed(reducer):returnt1(t2(reducer))returncomposeddeftransduce(data,initial,transducer,reducer):transformed_reducer=transducer(reducer)returnreduce(transformed_reducer,data,initial)data=range(1000000)# Measure traditional approachstart=time.time()traditional_result=traditional_approach(data)traditional_time=time.time()-start# Measure transducer approachxform=compose(mapping(lambdax:x*2),filtering(lambdax:x%2==0))defefficient_reducer(acc,x):acc.append(x)returnaccstart=time.time()transducer_result=transduce(data,[],xform,efficient_reducer)transducer_time=time.time()-start# Resultsprint(f"Traditional approach time: {traditional_time:.4f} seconds")print(f"Transducer approach time: {transducer_time:.4f} seconds")print(f"Traditional is faster by: {transducer_time/traditional_time:.2f}x")
however when executed the transducer version is much slower in Python
Traditional approach time: 0.0654 seconds
Transducer approach time: 0.1822 seconds
Traditional is faster by: 2.78x
Are Transducers Suitable for Python?
While transducers offer theoretical benefits in terms of composability and efficiency, Python might not be the best language for leveraging these advantages. Here’s why:
Python’s Function Call Overhead:
Python has a relatively high overhead for function calls. Since transducers rely heavily on higher-order functions, this overhead can negate the performance gains that transducers are designed to offer.
Optimized Built-in Functions:
Python’s built-in functions like map, filter, and list comprehensions are highly optimized in C. These built-ins often outperform custom transducer implementations, especially for common tasks.
Efficient Mutation with Lists:
Python’s lists are mutable, and appending to a list in a loop is highly efficient. The traditional method of using list comprehensions or filter and map is often faster and more straightforward than setting up a transducer pipeline.
When to Use Transducers
Transducers shine in functional programming languages that emphasize immutability and composability, such as Clojure or Gleam. In these languages, transducers can significantly reduce the overhead of creating intermediate collections and improve performance in complex data pipelines. They’re especially powerful when working with immutable data structures, where avoiding unnecessary copies is crucial for efficiency.
In contrast, Python’s strength lies in its mutable data structures and optimized built-in functions, which often make traditional approaches more performant. However, if you’re working in a functional programming environment where immutability is the norm, or if you need to maintain a consistent API across various data sources, transducers can be a valuable tool.
Conclusion
Transducers are a powerful tool in the right context, but Python’s inherent characteristics—such as function call overhead and optimized built-ins—mean that traditional approaches may be more efficient for typical data processing tasks. If you’re working in a language that deeply benefits from transducers, like Gleam, they can greatly enhance your code. In Python, however, it’s often best to use the language’s strengths, such as list comprehensions and optimized built-ins, for performance-critical applications.
There are some common pitfalls, many of these are legacy issues retained for backward compatibility.
I want to share some of them.
Global Interpreter Lock (GIL)
It’s 2024, but Python still struggles with multi-core utilization due to the Global Interpreter Lock (GIL).
The GIL prevents multiple native threads from executing Python bytecode simultaneously.
This significantly limits the effectiveness of multi-threading for CPU-bound tasks in CPython.
While technically a CPython implementation detail, Python’s lack of a formal language specification means CPython’s behavior is often duplicated in other implementations.
Historically, when Python was created, there were no multi-core CPUs. When multi-core CPUs emerged, OS started to add thread support, the author added a thread interface as well, but the implementation was essentially single-core. The intention was to add real multi-threaded implementation later, but 30 years on, Python still grapples with this issue.
The GIL’s persistence is largely due to backward compatibility concerns and the fundamental changes removing it would require in the language and its ecosystem.
Lack of Block Scope
Unlike many languages, Python doesn’t have true block scope. It uses function scope and module scope instead.
defexample_function():ifTrue:x=10# This variable is not block-scopedprint(x)# This works in Python, x is still accessibleexample_function()# Outputs: 10
Implications:
Loop Variable Leakage:
foriinrange(5):passprint(i)# This prints 4, the last value of i
Unexpected Variable Overwrites:
x=10ifTrue:x=20# This overwrites the outer x, not create a new oneprint(x)# Prints 20
Difficulty in Creating Temporary Variables: It’s harder to create variables that are guaranteed to be cleaned up after a block ends.
List Comprehension Exception: Interestingly, list comprehensions do create their own scope in Python 3.x.
[xforxinrange(5)]print(x)# This raises a NameError in Python 3.x
Best practices:
Use functions to simulate block scope when needed.
Be mindful of variable names to avoid accidental overwrites.
Be cautious of the risk of using incorrect variable names in large functions.
This behavior is particularly confusing because it goes against the intuitive understanding of how closures should work in many other languages.
The __init__.py Requirement
In Python 2 and early versions of Python 3, a directory had to contain an __init__.py file to be treated as a package.
This requirement often confused beginners and led to subtle bugs when forgotten.
It provided a clear, explicit way to define package boundaries and behavior.
Evolution:
Python 3.3 introduced PEP 420, allowing for implicit namespace packages.
Directories without __init__.py can now be treated as packages under certain conditions.
Modern best practices:
Use __init__.py when you need initialization code or to control package exports.
For simple packages or namespace packages, you can often omit __init__.py in Python 3.
Understanding these pitfalls is crucial for writing efficient, bug-free Python code. While they can be frustrating, they’re part of Python’s evolution and often retained for backward compatibility. Being aware of them will help you navigate Python development more effectively.
Python list comprehension is faster than using the .append() method in a loop because it utilizes a special LIST_APPEND bytecode, which is more efficient than the costly object method lookup.
[core]pager=delta[interactive]diffFilter=delta --color-only[delta]navigate=true # use n and N to move between diff sections
light = false # set to true if you're in a terminal w/ a light background color (e.g. the default macOS terminal)[merge]conflictstyle=diff3[diff]colorMoved=default
# Add HomeBrew's bin directory to path so you can use HomeBrew's binaries like `starship`# Fish uses `fish_add_path` instead of `export PATH` modify $PATH.fish_add_path "/opt/homebrew/bin/"# Enable Starship promptstarship init fish |source
abbr proxyall "set --export http_proxy http://127.0.0.1:7890; set --export https_proxy http://127.0.0.1:7890"
custom function
it’s very easy to add a custom function in fish shell,
an example
Visual Studio Code
Settings
Side Bar:Location change to right
extensions
Auto Hide
AutoTrim
Emacs Friendly Keymap
Indenticator
Sort lines
vscode-icons
Python
use
pyenv
to manage Python environments, don’t reply on the python installed by brew, because it might update Python version upexpecetdly when performs brew update.
brew install readline xz pyenv
# otpinal: setup pyenv with fish shellaliasbrew="env PATH=(string replace (pyenv root)/shims '' \"\$PATH\") brew"exec"$SHELL"pyenv install 3.11.6
pyenv global 3.11.6
I recently ventured into deploying a service on Google Cloud Run. My goal was straightforward: create a service that fetches webpage titles and captures screenshots of URLs. However, the journey led me into a peculiar bug when I actually used it on Goole Cloud Run.
The Bug
During the development phase, I worked with a python:3.11-slim base image on macOS, and my Dockerfile functioned without a hitch. Here’s a snapshot of the Dockerfile I used:
Yet, upon deploying to Google Cloud Run and initiating the screenshot capture process, I hit a snag:
playwright._impl._api_types.Error: Executable doesn't exist at /home/.cache/ms-playwright/chromium-1084/chrome-linux/chrome
╔═════════════════════════════
║ Looks like Playwright was just installed or updated.
║ Please run the following command to download new browsers:
║
║ playwright install
║
║ <3 Playwright Team
╚═════════════════════════════
Official Playwright Docker Image Saves the Day
Rather than wrestle with the error, I pivoted to an official Docker image of Playwright, and skipped installation of dependency:
The playwright 1.39.0 requires slightly more than 512MB of memory to run on Google Cloud Run. Adjust the memory limit on GCR, as it’s 512 MB by default.
Conclusion
Use the official Docker image to save time, or specify the PLAYWRIGHT_BROWSERS_PATH environment variable on a supported linux docker image.
a-shell notably supports Python, which is relevant given Python’s ongoing discussions about adding Tier 3 support for the iOS platform in version 3.13.
PEP 730
.
The focus is on “embedded mode”, since there are no stdout on iOS and you can’t provide things like the Python REPL
This code example demonstrates how to use Python’s Flask framework and the python-telegram-bot library (version 13.0) to build a webhook for a Telegram bot. We will utilize the Google Cloud Run container to handle incoming messages and invoke the bot without long polling.
To handle incoming messages, we define a function called photo_handler that will be used as a handler for photo messages.
defphoto_handler(update:Update,context):user_id=update.message.from_user.idowner_id=123456789# Replace with your user_idifuser_id==owner_id:file_id=update.message.photo[-1].file_id# Get the highest-resolution photofile_info=bot.get_file(file_id)file_url=file_info.file_path# Download the file (assuming you have 'requests' installed)response=requests.get(file_url)withopen("received_photo.jpg","wb")asf:f.write(response.content)
Setting up the Webhook Endpoint
We define a route for the /webhook endpoint using the @app.route decorator. This endpoint will receive incoming updates from Telegram.
We add the photo_handler function as a message handler using the MessageHandler class from the telegram.ext module. We specify the Filters.photo filter to only handle photo messages.
By using Flask and the python-telegram-bot library, we can easily build a webhook for a Telegram bot. This allows us to handle incoming messages efficiently using a Google Cloud Run container instead of using long polling.