Overview
Cloudstic uses a Merkle Hash Array Mapped Trie (HAMT) to map file IDs to their metadata references. This data structure provides:- Efficient lookups — O(log₃₂ n) depth due to 32-way branching
- Structural sharing — unchanged subtrees are reused by reference between snapshots
- Persistent immutability — all mutations return a new root while preserving the old one
- Content-addressable nodes — nodes are identified by the hash of their content
Configuration
Frominternal/hamt/hamt.go:
Design Rationale
- 5 bits per level provides 32-way branching, balancing tree depth with node size
- Max depth of 6 supports 32^6 = 1,073,741,824 entries before hitting depth limit
- Max leaf size of 32 keeps leaf nodes small enough to serialize efficiently
Node Types
Internal Node
Internal nodes use a bitmap to compactly represent which of the 32 child slots are populated:- Each bit position (0-31) corresponds to a child slot
- A set bit indicates that slot is populated
- The
childrenarray contains only the populated slots, in order - Child position is computed via
popcount(bitmap & (bit - 1))
The bitmap compression reduces storage: a node with only 2 children stores 2 refs instead of a 32-element array.
Leaf Node
Leaf nodes store actual key-value entries:Path Key Computation
File IDs are hashed to produce a “path key” that determines the traversal path through the tree:- Hash the file ID →
a7f3c92e... - Extract bits 27-31 for level 0 → index 20
- Extract bits 22-26 for level 1 → index 15
- Continue until reaching a leaf or empty slot
Operations
TheTree type exposes a functional API where all mutations return a new root:
Insert Algorithm
- Lookup existing node at the current level
- If node is a leaf:
- If entry exists, update it in place (copy-on-write)
- If leaf has room (< 32 entries), append the new entry
- If leaf is full and we’re not at max depth, split into an internal node
- If node is internal:
- Compute the child index for this level
- Recursively insert into the appropriate child
- Update the parent’s child reference
- Save the modified node and return its new reference
Delete Algorithm
- Traverse to the leaf containing the key
- Remove the entry from the leaf
- If leaf becomes empty, return an empty ref
- Propagate changes up the tree:
- If a child becomes empty, remove its bit from the parent’s bitmap
- If an internal node is left with only one child that is a leaf, collapse it by promoting the leaf
Structural Sharing
Only nodes along the path of a modified entry change:Diff Algorithm
TheDiff operation performs a parallel traversal of two HAMT roots:
- If both nodes are leaves, compare entries directly
- If nodes are internal, iterate through all 32 buckets:
- If bucket exists in only one tree, collect all entries as added/removed
- If bucket exists in both, recursively diff the child nodes
- Yield a
DiffEntryfor each difference found
The diff operation is structural — if two subtrees have the same root hash, they are identical and can be skipped entirely.
TransactionalStore
During a backup, thousands of HAMT nodes are created as the tree is built incrementally. Many of these intermediate nodes become unreachable once the tree is fully constructed. TheTransactionalStore buffers new nodes in memory and only flushes the reachable subset from the final root:
- Start a BFS from the root node
- For each node visited, check if it’s in the in-memory buffer
- If buffered, write it to the persistent store
- Add all child refs to the BFS queue
- Discard any buffered nodes that were never visited
Garbage Collection
TheNodeRefs method walks the entire tree and yields every node reference:
Performance Characteristics
| Operation | Complexity | Notes |
|---|---|---|
| Insert | O(log₃₂ n) | Maximum 6 levels for 1B entries |
| Lookup | O(log₃₂ n) | Typically 2-3 network requests |
| Delete | O(log₃₂ n) | Includes potential collapse |
| Walk | O(n) | Visits every leaf entry once |
| Diff | O(m) | Where m = changed entries |
With packfiles enabled, multiple HAMT nodes are bundled into 8MB packs. An LRU cache reduces the average lookup to 0-1 network requests after the initial pack fetch.