Cloudstic uses a distributed lock protocol stored directly inside the repository (underDocumentation Index
Fetch the complete documentation index at: https://docs.cloudstic.com/llms.txt
Use this file to discover all available pages before exploring further.
index/) to prevent concurrent writes from corrupting data. This page explains the lock types, which operations hold them, the TTL and refresh mechanism, and how to recover from a stuck lock.
Lock Types
There are two lock types, implementing a standard reader-writer protocol:| Type | Storage key | Rules |
|---|---|---|
| Shared | index/lock.shared/<timestamp> | Multiple shared locks can coexist. Acquired by backup and restore. |
| Exclusive | index/lock.exclusive | Only one at a time. Blocks all shared locks. Acquired by prune. |
Which Operation Holds Which Lock
| Command | Lock type | Acquired | Released |
|---|---|---|---|
backup | Shared | Start of run (skipped for -dry-run) | On process exit |
restore | Shared | Start of run (always, including -dry-run) | On process exit |
prune | Exclusive | Start of run (skipped for -dry-run) | On process exit |
forget | None | N/A | N/A |
check | None | N/A | N/A |
Lock Payload
Each lock is a JSON object written to the repository store:holder is "<hostname> (pid <pid>)" of the process that acquired the lock.
TTL and Automatic Refresh
Locks are designed to be short-lived so a crashed process never blocks access for long:- TTL: 1 minute from acquisition.
- Refresh: While the process is alive, a background goroutine rewrites the lock every 30 seconds, extending
expires_atby another minute. - Crash recovery: If the process is killed, the refresh goroutine stops. The lock expires after at most 1 minute. The next operation sees the stale
expires_atand proceeds automatically: no manual intervention required. - Refresh failure: If the backing store becomes unreachable, the goroutine gives up after 3 consecutive failures and lets the TTL expire naturally.
Conflict Rules
| Trying to acquire → | Shared (backup/restore) | Exclusive (prune) |
|---|---|---|
| Shared lock active | ✅ Allowed | ❌ Blocked |
| Exclusive lock active | ❌ Blocked | ❌ Blocked |
TOCTOU Mitigation
Object stores like S3 and B2 don’t support atomic conditional writes. To reduce the risk of two processes claiming a lock simultaneously:- Exclusive lock: After writing
index/lock.exclusive, the engine immediately re-reads it and verifiesholder + acquired_atstill match. If another process won the race, the acquire fails. - Shared lock: After writing
index/lock.shared/<timestamp>, the engine re-checksindex/lock.exclusive. If an exclusive lock appeared concurrently, the shared lock entry is deleted and the acquire fails.
This mitigation reduces but does not eliminate races on eventually-consistent stores. In practice, the acquire-then-verify pattern makes collisions vanishingly rare.
Stale Lock Recovery
A lock is stale when itsexpires_at is in the past. Stale locks are ignored automatically. No manual action is needed.
If you cannot wait for the 1-minute TTL (e.g. you need to unblock a deployment immediately), use break-lock:
index/lock.exclusive and all index/lock.shared/* entries, regardless of TTL or holder. See the break-lock reference for full usage.
Concurrency Semantics
Becausebackup and restore use shared locks, they can run concurrently against the same repository without conflict. Each backup run writes its own snapshot independently. prune requires an exclusive lock and must wait for all active backups to finish (or fail fast if they’re running).
This means:
- Two simultaneous backups: ✅ Both succeed; each creates its own snapshot.
- Backup + restore simultaneously: ✅ Both succeed.
- Backup while prune is running: ❌ Backup fails immediately with a lock error.
- Prune while backup is running: ❌ Prune fails immediately with a lock error.
- Two simultaneous prunes: ❌ Second prune fails immediately.
See Also
break-lockcommand: Force-remove stale locksprunecommand: Exclusive lock holderbackupcommand: Shared lock holder