Overview
This page documents the exact structure of each object type in the Cloudstic storage model. All objects are stored as JSON (except chunks, which are raw binary) and keyed by their hash.Chunk
Object key:chunk/<hmac_sha256> or chunk/<sha256>
Format: Raw binary (zstd-compressed bytes)
Chunks are the only non-JSON objects in the system. They contain raw file data compressed with zstd.
Content-Defined Chunking
Chunks are produced by FastCDC (Fast Content-Defined Chunking):| Parameter | Value |
|---|---|
| Min size | 512 KiB |
| Avg size | 1 MiB |
| Max size | 8 MiB |
Hash Function
- When encrypted:
HMAC-SHA256(dedup_key, uncompressed_data)- The dedup key is derived from the encryption key via HKDF
- Prevents the storage provider from confirming file contents by hashing known plaintext
- When unencrypted:
SHA-256(uncompressed_data)
The hash is computed on the uncompressed data, not the stored zstd-compressed bytes. This ensures consistent deduplication regardless of compression settings.
Storage Format
- Fetch the object bytes
- Decompress with zstd
- Return the raw file data
Content
Object key:content/<sha256-of-raw-file-content>
Format: JSON object
Content objects list the ordered chunks that make up a file’s content.
Go Struct Definition
Frominternal/core/models.go:
Inline Data Optimization
Very small files (< 512 KiB) may usedata_inline_b64 instead of chunks to avoid creating a separate chunk object:
FileMeta
Object key:filemeta/<sha256-of-serialized-json>
Format: JSON object
FileMeta objects contain immutable metadata about a file or folder.
Go Struct Definition
Frominternal/core/models.go:
Field Descriptions
| Field | Description |
|---|---|
fileId | Source-specific unique identifier (Google Drive ID, relative path) |
type | "file" or "folder" |
parents | List of filemeta/<sha256> refs pointing to parent metadata objects |
content_hash | SHA-256 of the raw file content (used to key the Content object) |
paths | Reserved for future use (multi-path support) |
extra | Source-specific metadata (e.g. MIME type, trashed status) |
Important:
fileId is the HAMT key used to look up this file’s metadata. It must be unique within a snapshot.Folder Representation
Folders are represented with:type: "folder"content_hash: ""(empty string)size: 0chunks: []in the content object (if created)
Parent References
Theparents field contains refs to parent FileMeta objects, not raw file IDs. This allows reconstructing the full directory path by walking the parent chain.
HAMT Node
Object key:node/<sha256-of-serialized-json>
Format: JSON object
See HAMT Structure for detailed documentation.
Internal Node
Leaf Node
Go Struct Definition
Frominternal/core/models.go:
Snapshot
Object key:snapshot/<sha256-of-serialized-json>
Format: JSON object
Snapshots are point-in-time backup checkpoints referencing a HAMT root.
Go Struct Definition
Frominternal/core/models.go:
Field Descriptions
| Field | Description |
|---|---|
seq | Monotonically increasing sequence number |
source | Origin of the backup (type, account, path) — used for retention grouping |
meta | Free-form key-value metadata (generator, hostname, etc.) |
tags | User-defined labels for retention policies |
change_token | Opaque token for incremental sources (omitted when not applicable) |
exclude_hash | Hash of the exclude patterns used for this snapshot |
Every snapshot is a complete checkpoint — no delta replay needed. Structural sharing via the HAMT minimizes the number of new nodes.
Change Tokens
Incremental sources (gdrive-changes, onedrive-changes) record an opaque change_token in each snapshot. On the next backup:
- Read the token from the previous snapshot
- Pass it to the source to get only changed files since that token
- Save the new token in the new snapshot
Index Objects
index/latest
Object key:index/latest
Format: JSON object
A mutable pointer to the most recent snapshot.
index/snapshots
Object key:index/snapshots
Format: JSON array
A catalog of lightweight snapshot summaries, used to avoid fetching each full snapshot object:
The catalog self-heals via reconciliation with
LIST snapshot/ on load. If the catalog is missing or stale, it’s rebuilt automatically.index/packs
Object key:index/packs
Format: bbolt database
When packfiles are enabled, the pack catalog is a bbolt key-value database mapping logical object keys to their location within packfiles:
Encryption Key Slots
Object key:keys/<slot_name>
Format: JSON object (stored unencrypted)
Key slots wrap the repository’s master encryption key using various methods:
password— Scrypt-derived keyplatform— OS keychain-derived keykms-platform— AWS KMS-wrapped platform keyrecovery— BIP39 mnemonic-derived key
Key slots are stored unencrypted under the
keys/ prefix, which the EncryptedStore passes through. Only the wrapped master key is encrypted.Repository Config
Object key:config
Format: JSON object (stored unencrypted)
The repository marker written by init: