System Architecture - Cloudstic CLI

Cloudstic CLI is built with a clean, layered architecture that separates concerns and enables flexible composition of storage backends, encryption, and compression.

Architectural Review (March 2026): A deep audit confirmed the system follows a robust Hexagonal/Layered pattern. Key strengths include the sophisticated use of the Decorator Pattern for storage orchestration and strong decoupling between the engine and infrastructure. The system has been recently updated to ensure rigorous context.Context propagation across all I/O-bound layers, improving reliability for long-running cloud operations.

High-Level Overview

Cloudstic is a content-addressable, encrypted backup tool that supports multiple data sources and storage backends. Every backup operation produces immutable, deduplicated snapshots that can be restored independently.

Package Structure

Core Packages

The codebase follows Go best practices with clear separation between public APIs and internal implementation details.

`client.go` (Root)

The public API for programmatic use. Re-exports types from internal packages using Go type aliases, providing a stable interface for library consumers. Location: client.go:44

`cmd/cloudstic/`

The CLI entry point. Each subcommand (init, backup, restore, list, ls, prune, forget, diff, break-lock) is implemented as a run*() function in main.go.

Uses Go’s built-in flag package (no external dependencies like cobra/viper)
All commands use the Client API for consistency

Location: cmd/cloudstic/main.go:46

`internal/engine/`

The business logic layer. Each operation has a dedicated *Manager struct:

BackupManager - Scans sources, chunks files, updates HAMT
RestoreManager - Reconstructs file trees from snapshots
PruneManager - Mark-and-sweep garbage collection
ForgetManager - Snapshot removal and retention policies
DiffManager - Snapshot comparison

Each manager has a Run(ctx) method that executes the operation. Location: internal/engine/

`internal/core/`

Domain types and models. Defines the core data structures:

Snapshot - Point-in-time backup checkpoint
FileMeta - File metadata (name, size, mtime, content hash)
Content - Chunk manifest for file contents
HAMTNode - Merkle tree node (internal or leaf)
RepoConfig - Repository configuration
SourceInfo - Backup source identity

Also contains ComputeJSONHash(), the canonical content-addressing function. Location: internal/core/models.go:1-120

`internal/hamt/`

Persistent Hash Array Mapped Trie (HAMT) implementation. A Merkle tree structure backed by the object store, used to efficiently track file→metadata mappings across snapshots. Key features:

5 bits per level → 32-way branching
Structural sharing between snapshots (only changed paths are rewritten)
TransactionalStore buffers writes and flushes only reachable nodes
Max 32 entries per leaf, max depth of 6 levels

Location: internal/hamt/hamt.go:1-602

`pkg/store/`

The storage abstraction layer. Defines:

ObjectStore interface - Get/Put/List/Delete/Exists operations
Backend implementations: LocalStore, S3Store, B2Store, SFTPStore
Source and IncrementalSource interfaces for data sources
Store decorators: CompressedStore, EncryptedStore, MeteredStore, PackStore

Location: pkg/store/

`pkg/crypto/`

Cryptographic primitives:

AES-256-GCM encryption/decryption
HKDF-SHA256 key derivation
HMAC-SHA256 for content-addressed deduplication
Argon2id password-based key derivation
BIP39 mnemonic recovery keys

Location: pkg/crypto/crypto.go:1-173

Store Decorator Pipeline

Cloudstic implements storage via a decorator pattern (store chaining). The order of decorators is critical for correct operation, as requests pass through a layered sequence of wrapper stores before reaching the persistent backing store.

Stores are layered in this specific order (from outermost to innermost):

Decorator Details

CompressedStore (pkg/store/compressed.go)

Compresses outgoing objects with zstd and decompresses incoming objects.
Auto-detects compression on read: zstd or raw.
Transparent to upper layers.

EncryptedStore (pkg/store/encrypted.go)

Encrypts all objects with AES-256-GCM and decrypts/authenticates ciphertext on read.
Exception: Objects under keys/ prefix pass through unencrypted (avoids chicken-and-egg problem).
Binary format: version(1) || nonce(12) || ciphertext || tag(16).
29 bytes overhead per object.

MeteredStore (pkg/store/metered.go)

Tracks precise physical bytes written to and read from the underlying layer.
Required to report actual network bandwidth consumption.

PackStore (pkg/store/pack.go) - Optional

Buffers small objects (< 512KB) in memory.
Flushes as 8MB packfiles (packs/<hash>).
Maintains a catalog (index/packs) mapping logical keys to pack offsets.
Dramatically reduces API call count for metadata-heavy workloads on cloud backends.
Uses LRU cache for packfile reads (single fetch serves thousands of objects).

Object Key Conventions

All objects follow a flat, content-addressed namespace:

Content-addressable storage ensures that identical data is stored only once, regardless of where it appears in the file tree or across snapshots.

Prefix	Description	Example Key
`chunk/`	Raw file data chunks (FastCDC boundaries)	`chunk/a3f5b8...`
`content/`	Chunk manifests (list of chunk refs)	`content/7d9e2c...`
`filemeta/`	File metadata (name, type, parents, content hash)	`filemeta/4b8f1a...`
`node/`	HAMT internal/leaf nodes	`node/c9d2e5...`
`snapshot/`	Point-in-time backup snapshots	`snapshot/6e3a9f...`
`index/latest`	Mutable pointer to latest snapshot	`index/latest`
`index/snapshots`	Snapshot catalog (summaries of all snapshots)	`index/snapshots`
`index/packs`	Pack catalog (when packfiles enabled)	`index/packs`
`keys/<slot>`	Encryption key slots (unencrypted)	`keys/platform-default`
`config`	Repository marker (unencrypted)	`config`

Hash Functions

Chunks: HMAC-SHA256(dedup_key, data) when encrypted, SHA-256(data) otherwise
All other objects: SHA-256(canonical_json)

Location: internal/engine/chunker.go:166-172

Backup Data Flow

Backups are append-only and crash-safe. An interrupted backup cannot corrupt existing snapshots.

The backup process follows this sequence:

1. Load Previous State

index/latest → snapshot/<hash> → HAMT root node → file tree

Location: internal/engine/backup.go

2. Scan Source

Full scan: source.Walk() - enumerate all files
Incremental scan: source.WalkChanges() - only changed files (gdrive-changes, onedrive-changes)

For each file:

Look up fileId in the old HAMT
Compare metadata (name, size, mtime, type, parents)
If unchanged, reuse the old filemeta reference (zero upload)
If changed or new, queue for upload

Location: internal/engine/backup.go

3. Chunk and Upload

Concurrent worker pool (default: 10 workers) processes queued files:

File stream
    ↓
FastCDC chunker (512KB-1MB-8MB boundaries)
    ↓
HMAC-SHA256 (keyed by dedup key) or SHA-256
    ↓
zstd compression
    ↓
AES-256-GCM encryption
    ↓
store.Put("chunk/<hash>", data)  [skipped if Exists()]

Location: internal/engine/chunker.go:46-143 Chunk references are collected into a Content object:

{
  "type": "content",
  "size": 10485760,
  "chunks": ["chunk/a3f5...", "chunk/7d9e..."]
}

Location: internal/core/models.go:20-27 A FileMeta object is created:

{
  "version": 1,
  "fileId": "abc123",
  "name": "invoice.pdf",
  "type": "file",
  "parents": ["filemeta/4b8f..."],
  "content_hash": "7d9e2c...",
  "content_ref": "b4e1d...",
  "size": 21733,
  "mtime": 1710000000
}

Location: internal/core/models.go:29-43 New snapshots usually omit the legacy paths field from persisted FileMeta JSON. Restore, diff, and listing workflows reconstruct display paths from parents + name.

4. Update HAMT

The HAMT tree is updated with new filemeta references. The TransactionalStore buffers all intermediate nodes in memory. Location: internal/hamt/cache.go

5. Flush HAMT

Only reachable nodes are flushed to persistent storage (BFS from root). Intermediate superseded nodes are discarded. Location: internal/hamt/cache.go

6. Create Snapshot

A new Snapshot object is written:

{
  "version": 1,
  "created": "2025-12-01T12:00:00Z",
  "root": "node/c9d2e5...",
  "seq": 42,
  "source": {
    "type": "gdrive",
    "account": "user@gmail.com",
    "path": "my-drive://"
  }
}

Location: internal/core/models.go:78-89

7. Update Index

The mutable index/latest pointer is updated atomically. This is the commit point - until this completes, the previous snapshot remains authoritative. Location: internal/engine/backup.go

Restore Data Flow

Restoration reconstructs files from a snapshot:

1. Resolve Snapshot

user specifies: "latest" or snapshot hash
    ↓
index/latest → snapshot/<hash> → HAMT root

2. Walk HAMT

Traverse the HAMT to collect all filemeta entries. Location: internal/hamt/hamt.go:61-66

3. Topological Sort

Sort files by dependency (parent directories before children) to ensure correct creation order. Location: internal/engine/restore.go

4. Path Reconstruction

For each filemeta, walk the parents chain to build the full relative path. Location: internal/engine/restore.go

5. Write ZIP Archive

For each entry:

Folders: Write directory entry with stored mtime
Files: Load content/<hash> → fetch and decompress each chunk → stream to ZIP

Output is always a ZIP archive (used by both CLI and web). Location: internal/engine/restore.go

FastCDC Chunking

FastCDC (Fast Content-Defined Chunking) enables efficient deduplication by splitting files at content-based boundaries, not fixed offsets.

Cloudstic uses the fastcdc-go library with these parameters:

Parameter	Value	Purpose
Min size	512 KiB	Prevents excessive chunk fragmentation
Avg size	1 MiB	Target chunk size for good dedup/performance balance
Max size	8 MiB	Prevents unbounded memory usage

Key benefits:

Inserting/deleting bytes in a file only affects local chunks
Identical regions across files share chunks
Final chunk of a file may be smaller than min size

Location: internal/engine/chunker.go:24-28

Concurrency Model

Cloudstic uses Go’s concurrency primitives for parallel operations:

Backup upload: 10 concurrent workers (configurable)
HAMT flush: 20 concurrent workers
Chunk deduplication: Ensures each chunk is uploaded at most once, even with concurrent chunks

Location: internal/engine/chunker.go:79

Repository Locking

Operations acquire distributed locks stored in the repository under index/ to prevent concurrent writes:

Lock type	Key	Operations
Shared	`index/lock.shared/<timestamp>`	`backup`, `restore`; multiple can coexist
Exclusive	`index/lock.exclusive`	`prune`; blocks all shared and exclusive locks

forget and check acquire no lock. Locks have a 1-minute TTL, refreshed every 30 seconds while the process is alive. A crashed holder’s lock expires automatically without manual intervention. Location: internal/engine/repolock.go See Repository Locking for full details.

​High-Level Overview

​Package Structure

​Core Packages

​client.go (Root)

​cmd/cloudstic/

​internal/engine/

​internal/core/

​internal/hamt/

​pkg/store/

​pkg/crypto/

​Store Decorator Pipeline

​Decorator Details

​Object Key Conventions

​Hash Functions

​Backup Data Flow

​1. Load Previous State

​2. Scan Source

​3. Chunk and Upload

​4. Update HAMT

​5. Flush HAMT

​6. Create Snapshot

​7. Update Index

​Restore Data Flow

​1. Resolve Snapshot

​2. Walk HAMT

​3. Topological Sort

​4. Path Reconstruction

​5. Write ZIP Archive

​FastCDC Chunking

​Concurrency Model

​Repository Locking

High-Level Overview

Package Structure

Core Packages

`client.go` (Root)

`cmd/cloudstic/`

`internal/engine/`

`internal/core/`

`internal/hamt/`

`pkg/store/`

`pkg/crypto/`

Store Decorator Pipeline

Decorator Details

Object Key Conventions

Hash Functions

Backup Data Flow

1. Load Previous State

2. Scan Source

3. Chunk and Upload

4. Update HAMT

5. Flush HAMT

6. Create Snapshot

7. Update Index

Restore Data Flow

1. Resolve Snapshot

2. Walk HAMT

3. Topological Sort

4. Path Reconstruction

5. Write ZIP Archive

FastCDC Chunking

Concurrency Model

Repository Locking