Repository Locking - Cloudstic CLI

Cloudstic uses a distributed lock protocol stored directly inside the repository (under index/) to prevent concurrent writes from corrupting data. This page explains the lock types, which operations hold them, the TTL and refresh mechanism, and how to recover from a stuck lock.

Lock Types

There are two lock types, implementing a standard reader-writer protocol:

Type	Storage key	Rules
Shared	`index/lock.shared/<timestamp>`	Multiple shared locks can coexist. Acquired by `backup` and `restore`.
Exclusive	`index/lock.exclusive`	Only one at a time. Blocks all shared locks. Acquired by `prune`.

Which Operation Holds Which Lock

Command	Lock type	Acquired	Released
`backup`	Shared	Start of run (skipped for `-dry-run`)	On process exit
`restore`	Shared	Start of run (always, including `-dry-run`)	On process exit
`prune`	Exclusive	Start of run (skipped for `-dry-run`)	On process exit
`forget`	None	N/A	N/A
`check`	None	N/A	N/A

Lock Payload

Each lock is a JSON object written to the repository store:

{
  "operation":   "backup",
  "holder":      "my-hostname (pid 12345)",
  "acquired_at": "2026-03-07T09:00:00.000000000Z",
  "expires_at":  "2026-03-07T09:01:00.000000000Z",
  "is_shared":   true
}

holder is "<hostname> (pid <pid>)" of the process that acquired the lock.

TTL and Automatic Refresh

Locks are designed to be short-lived so a crashed process never blocks access for long:

TTL: 1 minute from acquisition.
Refresh: While the process is alive, a background goroutine rewrites the lock every 30 seconds, extending expires_at by another minute.
Crash recovery: If the process is killed, the refresh goroutine stops. The lock expires after at most 1 minute. The next operation sees the stale expires_at and proceeds automatically: no manual intervention required.
Refresh failure: If the backing store becomes unreachable, the goroutine gives up after 3 consecutive failures and lets the TTL expire naturally.

Conflict Rules

Trying to acquire →	Shared (backup/restore)	Exclusive (prune)
Shared lock active	✅ Allowed	❌ Blocked
Exclusive lock active	❌ Blocked	❌ Blocked

When blocked, the CLI exits immediately with an error. It does not wait for the lock to be released:

repository is exclusively locked by my-hostname (pid 12345) (operation: prune, acquired: ..., expires: ...)

TOCTOU Mitigation

Object stores like S3 and B2 don’t support atomic conditional writes. To reduce the risk of two processes claiming a lock simultaneously:

Exclusive lock: After writing index/lock.exclusive, the engine immediately re-reads it and verifies holder + acquired_at still match. If another process won the race, the acquire fails.
Shared lock: After writing index/lock.shared/<timestamp>, the engine re-checks index/lock.exclusive. If an exclusive lock appeared concurrently, the shared lock entry is deleted and the acquire fails.

This mitigation reduces but does not eliminate races on eventually-consistent stores. In practice, the acquire-then-verify pattern makes collisions vanishingly rare.

Stale Lock Recovery

A lock is stale when its expires_at is in the past. Stale locks are ignored automatically. No manual action is needed. If you cannot wait for the 1-minute TTL (e.g. you need to unblock a deployment immediately), use break-lock:

cloudstic break-lock

This unconditionally deletes index/lock.exclusive and all index/lock.shared/* entries, regardless of TTL or holder. See the break-lock reference for full usage.

Only run break-lock when you are certain no backup, restore, or prune process is actively running. Removing a lock held by an active process can corrupt the repository.

Concurrency Semantics

Because backup and restore use shared locks, they can run concurrently against the same repository without conflict. Each backup run writes its own snapshot independently. prune requires an exclusive lock and must wait for all active backups to finish (or fail fast if they’re running). This means:

Two simultaneous backups: ✅ Both succeed; each creates its own snapshot.
Backup + restore simultaneously: ✅ Both succeed.
Backup while prune is running: ❌ Backup fails immediately with a lock error.
Prune while backup is running: ❌ Prune fails immediately with a lock error.
Two simultaneous prunes: ❌ Second prune fails immediately.

Documentation Index

​Lock Types

​Which Operation Holds Which Lock

​Lock Payload

​TTL and Automatic Refresh

​Conflict Rules

​TOCTOU Mitigation

​Stale Lock Recovery

​Concurrency Semantics

​See Also

Lock Types

Which Operation Holds Which Lock

Lock Payload

TTL and Automatic Refresh

Conflict Rules

TOCTOU Mitigation

Stale Lock Recovery

Concurrency Semantics

See Also