High-Level Overview
Cloudstic is a content-addressable, encrypted backup tool that supports multiple data sources and storage backends. Every backup operation produces immutable, deduplicated snapshots that can be restored independently.Package Structure
Core Packages
The codebase follows Go best practices with clear separation between public APIs and internal implementation details.
client.go (Root)
The public API for programmatic use. Re-exports types from internal packages using Go type aliases, providing a stable interface for library consumers.
Location: client.go:44
cmd/cloudstic/
The CLI entry point. Each subcommand (init, backup, restore, list, ls, prune, forget, diff, break-lock, add-recovery-key) is implemented as a run*() function in main.go.
- Uses Go’s built-in
flagpackage (no external dependencies like cobra/viper) - All commands use the
ClientAPI for consistency
cmd/cloudstic/main.go:46
internal/engine/
The business logic layer. Each operation has a dedicated *Manager struct:
BackupManager- Scans sources, chunks files, updates HAMTRestoreManager- Reconstructs file trees from snapshotsPruneManager- Mark-and-sweep garbage collectionForgetManager- Snapshot removal and retention policiesDiffManager- Snapshot comparison
Run(ctx) method that executes the operation.
Location: internal/engine/
internal/core/
Domain types and models. Defines the core data structures:
Snapshot- Point-in-time backup checkpointFileMeta- File metadata (name, size, mtime, content hash)Content- Chunk manifest for file contentsHAMTNode- Merkle tree node (internal or leaf)RepoConfig- Repository configurationSourceInfo- Backup source identity
ComputeJSONHash(), the canonical content-addressing function.
Location: internal/core/models.go:1-120
internal/hamt/
Persistent Hash Array Mapped Trie (HAMT) implementation. A Merkle tree structure backed by the object store, used to efficiently track file→metadata mappings across snapshots.
Key features:
- 5 bits per level → 32-way branching
- Structural sharing between snapshots (only changed paths are rewritten)
- TransactionalStore buffers writes and flushes only reachable nodes
- Max 32 entries per leaf, max depth of 6 levels
internal/hamt/hamt.go:1-602
pkg/store/
The storage abstraction layer. Defines:
ObjectStoreinterface - Get/Put/List/Delete/Exists operations- Backend implementations:
LocalStore,S3Store,B2Store,SFTPStore,HybridStore SourceandIncrementalSourceinterfaces for data sources- Store decorators:
CompressedStore,EncryptedStore,MeteredStore,PackStore,KeyCacheStore
pkg/store/
pkg/crypto/
Cryptographic primitives:
- AES-256-GCM encryption/decryption
- HKDF-SHA256 key derivation
- HMAC-SHA256 for content-addressed deduplication
- Argon2id password-based key derivation
- BIP39 mnemonic recovery keys
pkg/crypto/crypto.go:1-173
Store Decorator Stack
Cloudstic uses the decorator pattern to compose storage capabilities. The order of decorators is critical for correct operation.
Decorator Details
CompressedStore (pkg/store/compressed.go)
- Compresses all writes with zstd (level 3)
- Auto-detects compression on read: zstd, gzip, or raw
- Transparent to upper layers
pkg/store/encrypted.go)
- Encrypts all objects with AES-256-GCM
- Exception: Objects under
keys/prefix pass through unencrypted (avoids chicken-and-egg problem) - Binary format:
version(1) || nonce(12) || ciphertext || tag(16) - 29 bytes overhead per object
pkg/store/metered.go)
- Tracks bytes written for UI progress bars
- No-op wrapper, just accounting
pkg/store/pack.go) - Optional
- Buffers small objects (< 512KB) in memory
- Flushes as 8MB packfiles (
packs/<hash>) - Maintains a catalog (
index/packs) mapping logical keys to pack offsets - Dramatically reduces API call count for metadata-heavy workloads
- Uses LRU cache for packfile reads (single fetch serves thousands of objects)
pkg/store/keycache.go)
- Caches
Exists()results in a temporary bbolt database - Uses singleflight to deduplicate concurrent writes for the same key
- Critical for performance with remote backends (avoids redundant S3/B2 HEAD requests)
Object Key Conventions
All objects follow a flat, content-addressed namespace:| Prefix | Description | Example Key |
|---|---|---|
chunk/ | Raw file data chunks (FastCDC boundaries) | chunk/a3f5b8... |
content/ | Chunk manifests (list of chunk refs) | content/7d9e2c... |
filemeta/ | File metadata (name, type, parents, content hash) | filemeta/4b8f1a... |
node/ | HAMT internal/leaf nodes | node/c9d2e5... |
snapshot/ | Point-in-time backup snapshots | snapshot/6e3a9f... |
index/latest | Mutable pointer to latest snapshot | index/latest |
index/snapshots | Snapshot catalog (summaries of all snapshots) | index/snapshots |
index/packs | Pack catalog (when packfiles enabled) | index/packs |
keys/<slot> | Encryption key slots (unencrypted) | keys/platform-default |
config | Repository marker (unencrypted) | config |
Hash Functions
- Chunks:
HMAC-SHA256(dedup_key, data)when encrypted,SHA-256(data)otherwise - All other objects:
SHA-256(canonical_json)
internal/engine/chunker.go:166-172
Backup Data Flow
Backups are append-only and crash-safe. An interrupted backup cannot corrupt existing snapshots.
1. Load Previous State
internal/engine/backup.go
2. Scan Source
- Full scan:
source.Walk()- enumerate all files - Incremental scan:
source.WalkChanges()- only changed files (gdrive-changes, onedrive-changes)
- Look up
fileIdin the old HAMT - Compare metadata (name, size, mtime, type, parents)
- If unchanged, reuse the old
filemetareference (zero upload) - If changed or new, queue for upload
internal/engine/backup.go
3. Chunk and Upload
Concurrent worker pool (default: 10 workers) processes queued files:internal/engine/chunker.go:46-143
Chunk references are collected into a Content object:
internal/core/models.go:20-27
A FileMeta object is created:
internal/core/models.go:29-43
4. Update HAMT
The HAMT tree is updated with newfilemeta references. The TransactionalStore buffers all intermediate nodes in memory.
Location: internal/hamt/cache.go
5. Flush HAMT
Only reachable nodes are flushed to persistent storage (BFS from root). Intermediate superseded nodes are discarded. Location:internal/hamt/cache.go
6. Create Snapshot
A newSnapshot object is written:
internal/core/models.go:78-89
7. Update Index
The mutableindex/latest pointer is updated atomically. This is the commit point - until this completes, the previous snapshot remains authoritative.
Location: internal/engine/backup.go
Restore Data Flow
Restoration reconstructs files from a snapshot:1. Resolve Snapshot
2. Walk HAMT
Traverse the HAMT to collect allfilemeta entries.
Location: internal/hamt/hamt.go:61-66
3. Topological Sort
Sort files by dependency (parent directories before children) to ensure correct creation order. Location:internal/engine/restore.go
4. Path Reconstruction
For eachfilemeta, walk the parents chain to build the full relative path.
Location: internal/engine/restore.go
5. Write ZIP Archive
For each entry:- Folders: Write directory entry with stored
mtime - Files: Load
content/<hash>→ fetch and decompress each chunk → stream to ZIP
internal/engine/restore.go
HybridStore Architecture
HybridStore is designed for multi-tenant SaaS deployments, routing metadata to PostgreSQL and chunks to B2.HybridStore backend splits storage by object type:
- PostgreSQL (with RLS):
filemeta/,node/,snapshot/,index/,config - Backblaze B2:
chunk/,packs/ - Write-through: Metadata is also written to B2 for disaster recovery
AGENTS.md:99
FastCDC Chunking
Cloudstic uses thefastcdc-go library with these parameters:
| Parameter | Value | Purpose |
|---|---|---|
| Min size | 512 KiB | Prevents excessive chunk fragmentation |
| Avg size | 1 MiB | Target chunk size for good dedup/performance balance |
| Max size | 8 MiB | Prevents unbounded memory usage |
- Inserting/deleting bytes in a file only affects local chunks
- Identical regions across files share chunks
- Final chunk of a file may be smaller than min size
internal/engine/chunker.go:24-28
Concurrency Model
Cloudstic uses Go’s concurrency primitives for parallel operations:- Backup upload: 10 concurrent workers (configurable)
- HAMT flush: 20 concurrent workers
- Chunk deduplication:
singleflightensures each chunk is uploaded at most once, even with concurrent backups
internal/engine/chunker.go:79 and pkg/store/keycache.go
Repository Locking
Operations acquire locks to prevent conflicts:- Shared lock: Backup (multiple concurrent backups allowed)
- Exclusive lock: Prune, forget (no concurrent operations)
pkg/store/lock.go