HomeAboutPostsTagsProjectsRSS

gleam

Updated
Words895
TagsRead4 minutes

Introducing Transducers: A Powerful Tool for Functional Programming

I recently learned the concept of transducer and implement it in [[Gleam]] language.

GitHub - nohzafk/gtransducer: Transducer in Gleam language

Transducers originated in Clojure, designed to tackle specific challenges in functional programming and data processing. If you’re working with large datasets, streaming data, or complex transformations, understanding transducers can significantly enhance the efficiency and composability of your code.

What Are Transducers?

At their core, transducers are composable functions that transform data. Unlike traditional functional programming techniques like map, filter, and reduce, which are tied to specific data structures, transducers abstract the transformation logic from the input and output, making them highly reusable and flexible.

Key Advantages of Transducers

1. Composability and Reusability

Transducers allow you to compose and reuse transformation logic across different contexts. By decoupling transformations from data structures, you can apply the same logic to lists, streams, channels, or any other sequential data structure. This makes your code more modular and adaptable.

2. Performance Optimization

One of the primary motivations for using transducers is to optimize data processing. Traditional approaches often involve creating intermediate collections, which can be costly in terms of performance, especially with large datasets. Transducers eliminate this overhead by performing all operations in a single pass, without generating intermediate results.

A Python example

import time
from functools import reduce

# Traditional approach
def traditional_approach(data):
    return [x * 2 for x in data if (x * 2) % 2 == 0]

# Transducer approach
def mapping(f):
    def transducer(reducer):
        def wrapped_reducer(acc, x):
            return reducer(acc, f(x))
        return wrapped_reducer
    return transducer

def filtering(pred):
    def transducer(reducer):
        def wrapped_reducer(acc, x):
            if pred(x):
                return reducer(acc, x)
            return acc
        return wrapped_reducer
    return transducer

def compose(t1, t2):
    def composed(reducer):
        return t1(t2(reducer))
    return composed

def transduce(data, initial, transducer, reducer):
    transformed_reducer = transducer(reducer)
    return reduce(transformed_reducer, data, initial)

data = range(1000000)

# Measure traditional approach
start = time.time()
traditional_result = traditional_approach(data)
traditional_time = time.time() - start

# Measure transducer approach
xform = compose(
    mapping(lambda x: x * 2),
    filtering(lambda x: x % 2 == 0)
)

def efficient_reducer(acc, x):
    acc.append(x)
    return acc

start = time.time()
transducer_result = transduce(data, [], xform, efficient_reducer)
transducer_time = time.time() - start

# Results
print(f"Traditional approach time: {traditional_time:.4f} seconds")
print(f"Transducer approach time: {transducer_time:.4f} seconds")
print(f"Traditional is faster by: {transducer_time / traditional_time:.2f}x")

however when executed the transducer version is much slower in Python

Traditional approach time: 0.0654 seconds
Transducer approach time: 0.1822 seconds
Traditional is faster by: 2.78x

Are Transducers Suitable for Python?

While transducers offer theoretical benefits in terms of composability and efficiency, Python might not be the best language for leveraging these advantages. Here’s why:

  1. Python’s Function Call Overhead: Python has a relatively high overhead for function calls. Since transducers rely heavily on higher-order functions, this overhead can negate the performance gains that transducers are designed to offer.

  2. Optimized Built-in Functions: Python’s built-in functions like map, filter, and list comprehensions are highly optimized in C. These built-ins often outperform custom transducer implementations, especially for common tasks.

  3. Efficient Mutation with Lists: Python’s lists are mutable, and appending to a list in a loop is highly efficient. The traditional method of using list comprehensions or filter and map is often faster and more straightforward than setting up a transducer pipeline.

When to Use Transducers

Transducers shine in functional programming languages that emphasize immutability and composability, such as Clojure or Gleam. In these languages, transducers can significantly reduce the overhead of creating intermediate collections and improve performance in complex data pipelines. They’re especially powerful when working with immutable data structures, where avoiding unnecessary copies is crucial for efficiency.

In contrast, Python’s strength lies in its mutable data structures and optimized built-in functions, which often make traditional approaches more performant. However, if you’re working in a functional programming environment where immutability is the norm, or if you need to maintain a consistent API across various data sources, transducers can be a valuable tool.

Conclusion

Transducers are a powerful tool in the right context, but Python’s inherent characteristics—such as function call overhead and optimized built-ins—mean that traditional approaches may be more efficient for typical data processing tasks. If you’re working in a language that deeply benefits from transducers, like Gleam, they can greatly enhance your code. In Python, however, it’s often best to use the language’s strengths, such as list comprehensions and optimized built-ins, for performance-critical applications.

Updated
Words812
TagsRead3 minutes

I want to share what I’ve learned while porting The Little Learner from Racket to Gleam . I will compare some Racket code with Gleam.

Gleam is a simple functional programming language. Checkout this great article why simple programming language matters.

The language is so straightforward that an experienced programmer can learn it in just a day or two. However, to truly appreciate its simplicity and constraints, one must have prior experience with complex programming languages and substantial coding practice.

focus on concrete problem

It is hard for less experienced developers to appreciate how rarely architecting for future requirements / applications turns out net-positive.

John Caremark

Gleam’s philosophy is to focus on concrete problem rather than building abstractions.

Consider this Racket code I encountered, it is used in different places for different things.

(λ (l . r) l)

Although it looks beautiful, this function is overly complex and marking it difficult to understand.

The function behaves as follows:

  • If called with a single list argument, it returns the first element of that list.
  • If called with multiple arguments, it returns the first argument.

however it is never called with more than three arguments in the code base.

Translating this to Gleam result in a more understandable code, with not too much verbosity.

pub type Shape = List(Int)

pub fn default_shape_fn_1(shape: Shape) -> Shape {
  case shape {
    [] -> []
    [head, ..] -> [head]
  }
}

pub fn default_shape_fn_2(shape: Shape, _: Shape) -> Shape {
  shape
}

[[no hidden complexity]]

Racket provides layers of abstraction like rename-out to override operators like + - * / > < = (literally can be anything) when exporting module. This is great for building abstraction to teach concepts or to build another new language/DSL, but such flexibility often comes with maintainability costs and cognitive burden.

The Little Learner provides different implementations for tensors. Functions, and operators that appear identical have different meanings across different tensor implementations. Also operators are overridden, when reading the codebase, I often get confused by operators like <, sometimes it compare numbers, other times it compare scalars or tensors, Racket is a dynamic language, without type annotation, checking those functions and operations can be really frustrating and confusing.

While this uniform abstraction layer is beneficial for teaching machine learning concepts, it can be challenging when examining the actual code.

In contrast, Gleam shines with its simplicity and lack of hidden complexities. Everything must be explicitly stated, making the code clean and readable. Additionally, the compiler is smart enough to perform type inference, so you usually don’t need to add type notations for everything.

Updated
Words368
TagsRead2 minutes

Records in Gleam: Comparison and Uniqueness

Record Comparison

In [[Gleam]], records are compared by value (deep nested comparison), which can present challenges when using them as dictionary keys, unlike in some other functional languages.

Records are Compared by Value

It’s important to note that Gleam doesn’t have objects in the traditional sense. All data structures, including records, are compared by value. This means that two records with identical field values will be considered equal.

To make a record unique, an additional identifier field is necessary. This approach allows for distinguishing between records that might otherwise have the same content but represent different entities.

Ensuring Uniqueness

Simple Approach: UUID Field

One straightforward method to ensure record uniqueness is to add a UUID field. However, UUID strings can be memory-intensive and cpu-costly.

Improved Approach: Erlang Reference

A more efficient alternative is to use an [[erlang reference]] as a unique identifier for records.

Erlang references are unique identifiers created by the Erlang runtime system. They have several important properties:

  1. Uniqueness: Each reference is guaranteed to be unique within an Erlang node (and even across connected nodes).
  2. Lightweight: References are very memory-efficient.
  3. Unguessable: They can’t be forged or guessed, which can be useful for security in some contexts.
  4. Erlang-specific: They are native to the BEAM VM, so they work well with Gleam, which runs on this VM.

It’s important to note that:

  • Erlang references are not persistent across program runs. If you need to save and reload your records, you’ll need to implement a serialization strategy.
  • References are not garbage collected until the object they’re associated with is no longer referenced.

Example

import gleam/erlang

pub type TensorId =
  erlang.Reference

pub type Tensor {
  ScalarTensor(value: Float, id: TensorId)
  ListTensor(List(Tensor))
}

pub fn create_scalar_tensor(value: Float) -> Tensor {
  ScalarTensor(value, erlang.make_reference())
}

pub fn create_list_tensor(tensors: List(Tensor)) -> Tensor {
  ListTensor(tensors)
}

pub fn tensor_id(tensor: Tensor) -> TensorId {
  case tensor {
    ScalarTensor(_, id) -> id
    ListTensor(_) -> erlang.make_reference()
  }
}

pub fn tensor_equal(a: Tensor, b: Tensor) -> Bool {
  tensor_id(a) == tensor_id(b)
}
import gleam/dict

pub type GradientMap =
  dict.Dict(TensorId, Float)

pub fn create_gradient_map() -> GradientMap {
  dict.new()
}

pub fn set_gradient(map: GradientMap, tensor: Tensor, gradient: Float) -> GradientMap {
  dict.insert(map, tensor_id(tensor), gradient)
}

pub fn get_gradient(map: GradientMap, tensor: Tensor) -> Result(Float, Nil) {
  dict.get(map, tensor_id(tensor))
}

Updated
Words455
TagsRead3 minutes

typical usage of result.unwrap and result.or

result.unwrap and result.or are both useful functions in Gleam for working with Result types, but they serve different purposes.

result.unwrap

result.unwrap is used to extract the value from a Result, providing a default value if the Result is an Error. It’s typically used when you want to proceed with a default value rather than propagating an error.

Typical usage:

import gleam/result

pub fn get_user_name(user_id: Int) -> Result(String, Nil) {
  // Simulated user lookup
  case user_id {
    1 -> Ok("Alice")
    2 -> Ok("Bob")
    _ -> Error(Nil)
  }
}

pub fn greet_user(user_id: Int) -> String {
  let name = get_user_name(user_id)
    |> result.unwrap("Guest")

  "Hello, " <> name
}

// Usage:
pub fn main() {
  io.println(greet_user(1))  // Prints: "Hello, Alice"
  io.println(greet_user(3))  // Prints: "Hello, Guest"
}

In this example, result.unwrap allows us to use a default value (“Guest”) when the user lookup fails, ensuring that we always have a name to greet.

result.or

result.or is used to provide an alternative Result when the first Result is an Error. It’s typically used when you have a fallback operation or value that you want to try if the primary operation fails.

Typical usage:

import gleam/result

pub fn get_config_from_file() -> Result(Config, String) {
  // Simulated file read
  Error("File not found")
}

pub fn get_default_config() -> Result(Config, String) {
  // Return a default configuration
  Ok(Config(..))
}

pub fn get_config() -> Result(Config, String) {
  get_config_from_file()
  |> result.or(get_default_config())
}

// Usage:
pub fn main() {
  case get_config() {
    Ok(config) -> io.println("Config loaded")
    Error(err) -> io.println("Failed to load config: " <> err)
  }
}

In this example, result.or allows us to try loading the configuration from a file first, and if that fails, fall back to using a default configuration. The get_config function will only return an Error if both operations fail.

Key differences and when to use each:

  • Use result.unwrap when you want to extract a value from a Result and have a sensible default to use if it’s an Error. This effectively “throws away” the error information.
  • Use result.or when you want to try an alternative operation if the first one fails, while still preserving the Result type. This allows you to chain multiple fallback options.
  • result.unwrap returns the unwrapped value directly, while result.or returns another Result.
  • result.unwrap is often used at the “edges” of your program where you need to interface with code that doesn’t use Results, while result.or is more commonly used within the “core” logic where you’re still working with Results.

Both functions are valuable tools for error handling in Gleam, and understanding when to use each can lead to more robust and expressive code.

Rust

Same principle can be applied to Rust as the design is very alike.

Updated
Words403
TagsRead1 minute

Recently discovered Gleam language, and totally fell in love with it shortly!

Here is how you can setup Gleam in doom-emacs

packages.el

We just need the gleam-ts-mode.el, do not download gleam-mode.el as when it gets compiled it requires tree-sitter which causes problem.

(package! gleam-ts-mode
  :recipe (:host github
           :repo "gleam-lang/gleam-mode"
           :branch "main"

           :files ("gleam-ts-*.el")))

config.el

(use-package! gleam-ts-mode
  :config
  ;; setup formatter to be used by `SPC c f`
  (after! apheleia
    (setf (alist-get 'gleam-ts-mode apheleia-mode-alist) 'gleam)
    (setf (alist-get 'gleam apheleia-formatters) '("gleam" "format" "--stdin"))))

(after! treesit
  (add-to-list 'auto-mode-alist '("\\.gleam$" . gleam-ts-mode)))

(after! gleam-ts-mode
  (unless (treesit-language-available-p 'gleam)
    ;; compile the treesit grammar file the first time
    (gleam-ts-install-grammar)))

hack

If you, like me, use Treesitter grammar files from Nix, the tree-sitter subdirectory within the directory specified by user-emacs-directory is linked to Nix’s read-only filesystem, meaning gleam-ts-install-grammar is unable to install grammar files there.

Here’s how you can adjust treesit-extra-load-path and install the grammar file.

(after! gleam-ts-mode
  (setq treesit-extra-load-path (list (expand-file-name "~/.local/tree-sitter/")))
  (unless (treesit-language-available-p 'gleam)
    ;; hack: change `out-dir' when install language-grammar'
    (let ((orig-treesit--install-language-grammar-1 (symbol-function 'treesit--install-language-grammar-1)))
      (cl-letf (((symbol-function 'treesit--install-language-grammar-1)
                 (lambda (out-dir lang url)
                   (funcall orig-treesit--install-language-grammar-1
                            "~/.local/tree-sitter/" lang url))))
        (gleam-ts-install-grammar)))))