Skip to main content
Cloudstic CLI is built with a clean, layered architecture that separates concerns and enables flexible composition of storage backends, encryption, and compression.
Architectural Review (March 2026): A deep audit confirmed the system follows a robust Hexagonal/Layered pattern. Key strengths include the sophisticated use of the Decorator Pattern for storage orchestration and strong decoupling between the engine and infrastructure. The system has been recently updated to ensure rigorous context.Context propagation across all I/O-bound layers, improving reliability for long-running cloud operations.

High-Level Overview

Cloudstic is a content-addressable, encrypted backup tool that supports multiple data sources and storage backends. Every backup operation produces immutable, deduplicated snapshots that can be restored independently.

Package Structure

Core Packages

The codebase follows Go best practices with clear separation between public APIs and internal implementation details.

client.go (Root)

The public API for programmatic use. Re-exports types from internal packages using Go type aliases, providing a stable interface for library consumers. Location: client.go:44

cmd/cloudstic/

The CLI entry point. Each subcommand (init, backup, restore, list, ls, prune, forget, diff, break-lock) is implemented as a run*() function in main.go.
  • Uses Go’s built-in flag package (no external dependencies like cobra/viper)
  • All commands use the Client API for consistency
Location: cmd/cloudstic/main.go:46

internal/engine/

The business logic layer. Each operation has a dedicated *Manager struct:
  • BackupManager - Scans sources, chunks files, updates HAMT
  • RestoreManager - Reconstructs file trees from snapshots
  • PruneManager - Mark-and-sweep garbage collection
  • ForgetManager - Snapshot removal and retention policies
  • DiffManager - Snapshot comparison
Each manager has a Run(ctx) method that executes the operation. Location: internal/engine/

internal/core/

Domain types and models. Defines the core data structures:
  • Snapshot - Point-in-time backup checkpoint
  • FileMeta - File metadata (name, size, mtime, content hash)
  • Content - Chunk manifest for file contents
  • HAMTNode - Merkle tree node (internal or leaf)
  • RepoConfig - Repository configuration
  • SourceInfo - Backup source identity
Also contains ComputeJSONHash(), the canonical content-addressing function. Location: internal/core/models.go:1-120

internal/hamt/

Persistent Hash Array Mapped Trie (HAMT) implementation. A Merkle tree structure backed by the object store, used to efficiently track file→metadata mappings across snapshots. Key features:
  • 5 bits per level → 32-way branching
  • Structural sharing between snapshots (only changed paths are rewritten)
  • TransactionalStore buffers writes and flushes only reachable nodes
  • Max 32 entries per leaf, max depth of 6 levels
Location: internal/hamt/hamt.go:1-602

pkg/store/

The storage abstraction layer. Defines:
  • ObjectStore interface - Get/Put/List/Delete/Exists operations
  • Backend implementations: LocalStore, S3Store, B2Store, SFTPStore
  • Source and IncrementalSource interfaces for data sources
  • Store decorators: CompressedStore, EncryptedStore, MeteredStore, PackStore
Location: pkg/store/

pkg/crypto/

Cryptographic primitives:
  • AES-256-GCM encryption/decryption
  • HKDF-SHA256 key derivation
  • HMAC-SHA256 for content-addressed deduplication
  • Argon2id password-based key derivation
  • BIP39 mnemonic recovery keys
Location: pkg/crypto/crypto.go:1-173

Store Decorator Pipeline

Cloudstic implements storage via a decorator pattern (store chaining). The order of decorators is critical for correct operation, as requests pass through a layered sequence of wrapper stores before reaching the persistent backing store.
Stores are layered in this specific order (from outermost to innermost):

Decorator Details

CompressedStore (pkg/store/compressed.go)
  • Compresses outgoing objects with zstd and decompresses incoming objects.
  • Auto-detects compression on read: zstd or raw.
  • Transparent to upper layers.
EncryptedStore (pkg/store/encrypted.go)
  • Encrypts all objects with AES-256-GCM and decrypts/authenticates ciphertext on read.
  • Exception: Objects under keys/ prefix pass through unencrypted (avoids chicken-and-egg problem).
  • Binary format: version(1) || nonce(12) || ciphertext || tag(16).
  • 29 bytes overhead per object.
MeteredStore (pkg/store/metered.go)
  • Tracks precise physical bytes written to and read from the underlying layer.
  • Required to report actual network bandwidth consumption.
PackStore (pkg/store/pack.go) - Optional
  • Buffers small objects (< 512KB) in memory.
  • Flushes as 8MB packfiles (packs/<hash>).
  • Maintains a catalog (index/packs) mapping logical keys to pack offsets.
  • Dramatically reduces API call count for metadata-heavy workloads on cloud backends.
  • Uses LRU cache for packfile reads (single fetch serves thousands of objects).

Object Key Conventions

All objects follow a flat, content-addressed namespace:
Content-addressable storage ensures that identical data is stored only once, regardless of where it appears in the file tree or across snapshots.
PrefixDescriptionExample Key
chunk/Raw file data chunks (FastCDC boundaries)chunk/a3f5b8...
content/Chunk manifests (list of chunk refs)content/7d9e2c...
filemeta/File metadata (name, type, parents, content hash)filemeta/4b8f1a...
node/HAMT internal/leaf nodesnode/c9d2e5...
snapshot/Point-in-time backup snapshotssnapshot/6e3a9f...
index/latestMutable pointer to latest snapshotindex/latest
index/snapshotsSnapshot catalog (summaries of all snapshots)index/snapshots
index/packsPack catalog (when packfiles enabled)index/packs
keys/<slot>Encryption key slots (unencrypted)keys/platform-default
configRepository marker (unencrypted)config

Hash Functions

  • Chunks: HMAC-SHA256(dedup_key, data) when encrypted, SHA-256(data) otherwise
  • All other objects: SHA-256(canonical_json)
Location: internal/engine/chunker.go:166-172

Backup Data Flow

Backups are append-only and crash-safe. An interrupted backup cannot corrupt existing snapshots.
The backup process follows this sequence:

1. Load Previous State

index/latest → snapshot/<hash> → HAMT root node → file tree
Location: internal/engine/backup.go

2. Scan Source

  • Full scan: source.Walk() - enumerate all files
  • Incremental scan: source.WalkChanges() - only changed files (gdrive-changes, onedrive-changes)
For each file:
  • Look up fileId in the old HAMT
  • Compare metadata (name, size, mtime, type, parents)
  • If unchanged, reuse the old filemeta reference (zero upload)
  • If changed or new, queue for upload
Location: internal/engine/backup.go

3. Chunk and Upload

Concurrent worker pool (default: 10 workers) processes queued files:
File stream

FastCDC chunker (512KB-1MB-8MB boundaries)

HMAC-SHA256 (keyed by dedup key) or SHA-256

zstd compression

AES-256-GCM encryption

store.Put("chunk/<hash>", data)  [skipped if Exists()]
Location: internal/engine/chunker.go:46-143 Chunk references are collected into a Content object:
{
  "type": "content",
  "size": 10485760,
  "chunks": ["chunk/a3f5...", "chunk/7d9e..."]
}
Location: internal/core/models.go:20-27 A FileMeta object is created:
{
  "version": 1,
  "fileId": "abc123",
  "name": "invoice.pdf",
  "type": "file",
  "parents": ["filemeta/4b8f..."],
  "content_hash": "7d9e2c...",
  "content_ref": "b4e1d...",
  "size": 21733,
  "mtime": 1710000000
}
Location: internal/core/models.go:29-43 New snapshots usually omit the legacy paths field from persisted FileMeta JSON. Restore, diff, and listing workflows reconstruct display paths from parents + name.

4. Update HAMT

The HAMT tree is updated with new filemeta references. The TransactionalStore buffers all intermediate nodes in memory. Location: internal/hamt/cache.go

5. Flush HAMT

Only reachable nodes are flushed to persistent storage (BFS from root). Intermediate superseded nodes are discarded. Location: internal/hamt/cache.go

6. Create Snapshot

A new Snapshot object is written:
{
  "version": 1,
  "created": "2025-12-01T12:00:00Z",
  "root": "node/c9d2e5...",
  "seq": 42,
  "source": {
    "type": "gdrive",
    "account": "user@gmail.com",
    "path": "my-drive://"
  }
}
Location: internal/core/models.go:78-89

7. Update Index

The mutable index/latest pointer is updated atomically. This is the commit point - until this completes, the previous snapshot remains authoritative. Location: internal/engine/backup.go

Restore Data Flow

Restoration reconstructs files from a snapshot:

1. Resolve Snapshot

user specifies: "latest" or snapshot hash

index/latest → snapshot/<hash> → HAMT root

2. Walk HAMT

Traverse the HAMT to collect all filemeta entries. Location: internal/hamt/hamt.go:61-66

3. Topological Sort

Sort files by dependency (parent directories before children) to ensure correct creation order. Location: internal/engine/restore.go

4. Path Reconstruction

For each filemeta, walk the parents chain to build the full relative path. Location: internal/engine/restore.go

5. Write ZIP Archive

For each entry:
  • Folders: Write directory entry with stored mtime
  • Files: Load content/<hash> → fetch and decompress each chunk → stream to ZIP
Output is always a ZIP archive (used by both CLI and web). Location: internal/engine/restore.go

FastCDC Chunking

FastCDC (Fast Content-Defined Chunking) enables efficient deduplication by splitting files at content-based boundaries, not fixed offsets.
Cloudstic uses the fastcdc-go library with these parameters:
ParameterValuePurpose
Min size512 KiBPrevents excessive chunk fragmentation
Avg size1 MiBTarget chunk size for good dedup/performance balance
Max size8 MiBPrevents unbounded memory usage
Key benefits:
  • Inserting/deleting bytes in a file only affects local chunks
  • Identical regions across files share chunks
  • Final chunk of a file may be smaller than min size
Location: internal/engine/chunker.go:24-28

Concurrency Model

Cloudstic uses Go’s concurrency primitives for parallel operations:
  • Backup upload: 10 concurrent workers (configurable)
  • HAMT flush: 20 concurrent workers
  • Chunk deduplication: Ensures each chunk is uploaded at most once, even with concurrent chunks
Location: internal/engine/chunker.go:79

Repository Locking

Operations acquire distributed locks stored in the repository under index/ to prevent concurrent writes:
Lock typeKeyOperations
Sharedindex/lock.shared/<timestamp>backup, restore; multiple can coexist
Exclusiveindex/lock.exclusiveprune; blocks all shared and exclusive locks
forget and check acquire no lock. Locks have a 1-minute TTL, refreshed every 30 seconds while the process is alive. A crashed holder’s lock expires automatically without manual intervention. Location: internal/engine/repolock.go See Repository Locking for full details.