Overview
This page documents the exact structure of each object type in the Cloudstic storage model. All objects are stored as JSON (except chunks, which are raw binary) and keyed by their hash.
Chunk
Object key: chunk/<hmac_sha256> or chunk/<sha256>
Format: Raw binary (zstd-compressed bytes)
Chunks are the only non-JSON objects in the system. They contain raw file data compressed with zstd.
Content-Defined Chunking
Chunks are produced by FastCDC (Fast Content-Defined Chunking):
| Parameter | Value |
|---|
| Min size | 512 KiB |
| Avg size | 1 MiB |
| Max size | 8 MiB |
The final chunk of a file may be smaller than the minimum.
Hash Function
- When encrypted:
HMAC-SHA256(dedup_key, uncompressed_data)
- The dedup key is derived from the encryption key via HKDF
- Prevents the storage provider from confirming file contents by hashing known plaintext
- When unencrypted:
SHA-256(uncompressed_data)
The hash is computed on the uncompressed data, not the stored zstd-compressed bytes. This ensures consistent deduplication regardless of compression settings.
Chunk object:
┌──────────────────────┐
│ zstd-compressed data │ ← Raw bytes, no JSON wrapper
└──────────────────────┘
Reading a chunk:
- Fetch the object bytes
- Decompress with zstd
- Return the raw file data
Content
Object key: content/<sha256-of-raw-file-content>
Format: JSON object
Content objects list the ordered chunks that make up a file’s content.
{
"type": "content",
"size": 10485760,
"chunks": [
"chunk/a7f3c92e...",
"chunk/b4e1d83f...",
"chunk/c9a2e75d..."
]
}
Go Struct Definition
From internal/core/models.go:
type Content struct {
Type ObjectType `json:"type"` // "content"
Size int64 `json:"size"`
Chunks []string `json:"chunks,omitempty"` // List of "chunk/<sha256>"
DataInlineB64 []byte `json:"data_inline_b64,omitempty"` // For small files
}
Inline Data Optimization
Very small files (< 512 KiB) may use data_inline_b64 instead of chunks to avoid creating a separate chunk object:
{
"type": "content",
"size": 142,
"data_inline_b64": "SGVsbG8sIHdvcmxkIQ=="
}
This reduces object count and API calls for small files.
Object key: filemeta/<sha256-of-serialized-json>
Format: JSON object
FileMeta objects contain immutable metadata about a file or folder.
{
"version": 1,
"fileId": "1a2b3c4d5e6f",
"name": "invoice.pdf",
"type": "file",
"parents": ["filemeta/e7f8a9b0..."],
"content_hash": "b4e1d83f9a2c...",
"content_ref": "c5f2e94g0b3d...",
"size": 21733,
"mtime": 1710000000,
"owner": "user@example.com",
"extra": {
"mimeType": "application/pdf",
"trashed": false
},
"mode": 33188,
"uid": 501,
"gid": 20,
"btime": 1710000000,
"flags": 0,
"xattrs": {
"user.tag": "cHJvamVjdA=="
}
}
Go Struct Definition
From internal/core/models.go:
type FileMeta struct {
Version int `json:"version"`
FileID string `json:"fileId"` // HAMT key
Name string `json:"name"`
Type FileType `json:"type"` // "file" or "folder"
Parents []string `json:"parents"` // List of "filemeta/<sha256>" refs
Paths []string `json:"paths,omitempty"`
ContentHash string `json:"content_hash"` // SHA256 of raw content
ContentRef string `json:"content_ref,omitempty"` // HMAC(dedupKey, ContentHash) for secure backend lookup
Size int64 `json:"size"`
Mtime int64 `json:"mtime"` // Unix timestamp
Owner string `json:"owner"`
Extra map[string]interface{} `json:"extra,omitempty"`
Mode uint32 `json:"mode,omitempty"` // POSIX permission bits
Uid uint32 `json:"uid,omitempty"` // POSIX user ID
Gid uint32 `json:"gid,omitempty"` // POSIX group ID
Btime int64 `json:"btime,omitempty"` // birth/creation time, Unix seconds
Flags uint32 `json:"flags,omitempty"` // per-file flags (chflags / FS_IOC_GETFLAGS)
Xattrs map[string][]byte `json:"xattrs,omitempty"` // extended attributes: name → raw bytes
}
func (f *FileMeta) Ref() (string, []byte, error) {
hash, data, err := ComputeJSONHash(f)
if err != nil {
return "", data, err
}
return "filemeta/" + hash, data, nil
}
Field Descriptions
| Field | Description |
|---|
fileId | Source-specific unique identifier (Google Drive ID, relative path) |
type | "file" or "folder" |
parents | List of filemeta/<sha256> refs pointing to parent metadata objects |
content_hash | SHA-256 of the raw file content |
content_ref | HMAC of the content hash (used to key the Content object securely) |
paths | Optional legacy compatibility field. New snapshots usually omit it and derive display paths from parents + name. |
extra | Source-specific metadata (e.g. MIME type, trashed status) |
mode | POSIX file mode bits (e.g. 0644 = 420). Omitted if zero. |
uid | Numeric owner user ID. Omitted if zero. |
gid | Numeric owner group ID. Omitted if zero. |
btime | File creation (birth) time as Unix epoch seconds. Omitted if zero. |
flags | OS-specific file flags (macOS UF_*/SF_*, Linux FS_*_FL). Omitted if zero. |
xattrs | Extended attributes as name → base64(value) map. Omitted if empty. |
Important: fileId is the HAMT key used to look up this file’s metadata. It must be unique within a snapshot.
Folder Representation
Folders are represented with:
type: "folder"
content_hash: "" (empty string)
content_ref: "" (empty string)
size: 0
chunks: [] in the content object (if created)
Parent References
The parents field contains refs to parent FileMeta objects, not raw file IDs. This allows reconstructing the full directory path by walking the parent chain, which is now the primary restore/listing model for new snapshots.
HAMT Node
Object key: node/<sha256-of-serialized-json>
Format: JSON object
See HAMT Structure for detailed documentation.
Internal Node
{
"type": "internal",
"bitmap": 2348810305,
"children": [
"node/a7f3c92e...",
"node/b4e1d83f..."
]
}
Leaf Node
{
"type": "leaf",
"entries": [
{
"key": "1a2b3c4d5e6f",
"filemeta": "filemeta/e7f8a9b0..."
},
{
"key": "2b3c4d5e6f7g",
"filemeta": "filemeta/f8a9b0c1..."
}
]
}
Go Struct Definition
From internal/core/models.go:
type HAMTNode struct {
Type ObjectType `json:"type"` // "internal" or "leaf"
Bitmap uint32 `json:"bitmap,omitempty"`
Children []string `json:"children,omitempty"` // ["node/<sha256>", ...]
Entries []LeafEntry `json:"entries,omitempty"`
}
type LeafEntry struct {
Key string `json:"key"` // FileID
FileMeta string `json:"filemeta"` // "filemeta/<sha256>"
}
Snapshot
Object key: snapshot/<sha256-of-serialized-json>
Format: JSON object
Snapshots are point-in-time backup checkpoints referencing a HAMT root.
{
"version": 1,
"created": "2025-12-01T12:00:00Z",
"root": "node/a7f3c92e...",
"seq": 42,
"source": {
"type": "gdrive",
"account": "user@gmail.com",
"path": "my-drive://"
},
"meta": {
"generator": "cloudstic-cli",
"hostname": "workstation-01"
},
"tags": ["daily", "important"],
"change_token": "12345",
"exclude_hash": "d4c3b2a1..."
}
Go Struct Definition
From internal/core/models.go:
type Snapshot struct {
Version int `json:"version"`
Created string `json:"created"` // ISO8601
Root string `json:"root"` // "node/<sha256>"
Seq int `json:"seq"`
Source *SourceInfo `json:"source,omitempty"`
Meta map[string]string `json:"meta,omitempty"`
Tags []string `json:"tags,omitempty"`
ChangeToken string `json:"change_token,omitempty"`
ExcludeHash string `json:"exclude_hash,omitempty"`
}
type SourceInfo struct {
Type string `json:"type"` // e.g. "gdrive", "local"
Account string `json:"account,omitempty"` // friendly display account
Path string `json:"path,omitempty"` // friendly display path
Identity string `json:"identity,omitempty"` // stable container identity
PathID string `json:"path_id,omitempty"` // stable selected-root identity
DriveName string `json:"drive_name,omitempty"` // friendly container label
// Legacy compatibility fields (older snapshots)
VolumeUUID string `json:"volume_uuid,omitempty"`
VolumeLabel string `json:"volume_label,omitempty"`
}
Field Descriptions
| Field | Description |
|---|
seq | Monotonically increasing sequence number |
source | Origin and lineage identity of the backup (type, identity, path_id, display labels) |
meta | Free-form key-value metadata (generator, hostname, etc.) |
tags | User-defined labels for retention policies |
change_token | Opaque token for incremental sources (omitted when not applicable) |
exclude_hash | Hash of the exclude patterns used for this snapshot |
Every snapshot is a complete checkpoint, with no delta replay needed. Structural sharing via the HAMT minimizes the number of new nodes.
Change Tokens
Incremental sources (gdrive-changes, onedrive-changes) record an opaque change_token in each snapshot. On the next backup:
- Read the token from the previous snapshot
- Pass it to the source to get only changed files since that token
- Save the new token in the new snapshot
If no previous token exists (first backup or after switching from a full-scan source), the source performs a full scan and saves the initial token.
Index Objects
index/latest
Object key: index/latest
Format: JSON object
A mutable pointer to the most recent snapshot.
{
"latest_snapshot": "snapshot/a7f3c92e...",
"seq": 42
}
// From internal/core/models.go
type Index struct {
LatestSnapshot string `json:"latest_snapshot"` // "snapshot/<sha256>"
Seq int `json:"seq"`
}
index/snapshots
Object key: index/snapshots
Format: JSON array
A catalog of lightweight snapshot summaries, used to avoid fetching each full snapshot object:
[
{
"ref": "snapshot/a7f3c92e...",
"seq": 42,
"created": "2025-12-01T12:00:00Z",
"root": "node/e7f8a9b0...",
"source": {
"type": "gdrive",
"account": "user@gmail.com",
"path": "my-drive://"
},
"tags": ["daily"],
"change_token": "12345"
}
]
// From internal/core/models.go
type SnapshotSummary struct {
Ref string `json:"ref"` // "snapshot/<hash>"
Seq int `json:"seq"`
Created string `json:"created"` // ISO8601
Root string `json:"root"` // "node/<hash>"
Source *SourceInfo `json:"source,omitempty"`
Tags []string `json:"tags,omitempty"`
ChangeToken string `json:"change_token,omitempty"`
ExcludeHash string `json:"exclude_hash,omitempty"`
}
The catalog self-heals via reconciliation with LIST snapshot/ on load. If the catalog is missing or stale, it’s rebuilt automatically.
index/packs
Object key: index/packs
Format: bbolt database
When packfiles are enabled, the pack catalog is a bbolt key-value database mapping logical object keys to their location within packfiles:
Logical key: "filemeta/a7f3c92e..."
↓
Pack entry: { PackID: "pack/b4e1d83f...", Offset: 1024, Length: 256 }
The catalog is stored as a single object in the store and loaded into memory on startup.
Encryption Key Slots
Object key: keys/<slot_type>-<label> (e.g. keys/password-default)
Format: JSON object (stored unencrypted)
Key slots wrap the repository’s master encryption key using various methods.
Go Struct Definition
type KeySlot struct {
SlotType string `json:"slot_type"`
WrappedKey string `json:"wrapped_key"` // base64-encoded AES-GCM-wrapped master key
Label string `json:"label"` // e.g. "default"
KDFParams *KDFParams `json:"kdf_params,omitempty"` // only present for password slots
}
type KDFParams struct {
Algorithm string `json:"algorithm"` // "argon2id"
Salt string `json:"salt"` // base64-encoded, 32 random bytes
Time uint32 `json:"time"`
Memory uint32 `json:"memory"`
Threads uint8 `json:"threads"`
}
Password Slot
The master key is wrapped with a key derived from the password via Argon2id:
{
"slot_type": "password",
"wrapped_key": "base64-encoded-wrapped-master-key",
"label": "default",
"kdf_params": {
"algorithm": "argon2id",
"salt": "base64-encoded-salt",
"time": 3,
"memory": 65536,
"threads": 4
}
}
The master key is wrapped directly with a raw 32-byte platform key (no KDF):
{
"slot_type": "platform",
"wrapped_key": "base64-encoded-wrapped-master-key",
"label": "default"
}
Recovery Slot
The master key is wrapped with a key derived from a BIP39 24-word mnemonic:
{
"slot_type": "recovery",
"wrapped_key": "base64-encoded-wrapped-master-key",
"label": "default"
}
A platform key is wrapped by AWS KMS; the KMS-wrapped platform key is then used to unwrap the master key:
{
"slot_type": "kms-platform",
"wrapped_key": "base64-encoded-kms-wrapped-platform-key",
"label": "default"
}
Key slots are stored unencrypted under the keys/ prefix, which the EncryptedStore passes through. Only the wrapped master key is encrypted (AES-256-GCM).
Repository Config
Object key: config
Format: JSON object (stored unencrypted)
The repository marker written by init:
{
"version": 1,
"created": "2025-12-01T12:00:00Z",
"encrypted": true
}
// From internal/core/models.go
type RepoConfig struct {
Version int `json:"version"`
Created string `json:"created"` // ISO8601
Encrypted bool `json:"encrypted"`
}
This object is stored unencrypted so that clients can determine whether the repository requires decryption before attempting to read key slots.