Cloudstic uses a distributed lock protocol stored directly inside the repository (under index/) to prevent concurrent writes from corrupting data. This page explains the lock types, which operations hold them, the TTL and refresh mechanism, and how to recover from a stuck lock.
Lock Types
There are two lock types, implementing a standard reader-writer protocol:
| Type | Storage key | Rules |
|---|
| Shared | index/lock.shared/<timestamp> | Multiple shared locks can coexist. Acquired by backup and restore. |
| Exclusive | index/lock.exclusive | Only one at a time. Blocks all shared locks. Acquired by prune. |
Which Operation Holds Which Lock
| Command | Lock type | Acquired | Released |
|---|
backup | Shared | Start of run (skipped for -dry-run) | On process exit |
restore | Shared | Start of run (always, including -dry-run) | On process exit |
prune | Exclusive | Start of run (skipped for -dry-run) | On process exit |
forget | None | N/A | N/A |
check | None | N/A | N/A |
Lock Payload
Each lock is a JSON object written to the repository store:
{
"operation": "backup",
"holder": "my-hostname (pid 12345)",
"acquired_at": "2026-03-07T09:00:00.000000000Z",
"expires_at": "2026-03-07T09:01:00.000000000Z",
"is_shared": true
}
holder is "<hostname> (pid <pid>)" of the process that acquired the lock.
TTL and Automatic Refresh
Locks are designed to be short-lived so a crashed process never blocks access for long:
- TTL: 1 minute from acquisition.
- Refresh: While the process is alive, a background goroutine rewrites the lock every 30 seconds, extending
expires_at by another minute.
- Crash recovery: If the process is killed, the refresh goroutine stops. The lock expires after at most 1 minute. The next operation sees the stale
expires_at and proceeds automatically: no manual intervention required.
- Refresh failure: If the backing store becomes unreachable, the goroutine gives up after 3 consecutive failures and lets the TTL expire naturally.
Conflict Rules
| Trying to acquire → | Shared (backup/restore) | Exclusive (prune) |
|---|
| Shared lock active | ✅ Allowed | ❌ Blocked |
| Exclusive lock active | ❌ Blocked | ❌ Blocked |
When blocked, the CLI exits immediately with an error. It does not wait for the lock to be released:
repository is exclusively locked by my-hostname (pid 12345) (operation: prune, acquired: ..., expires: ...)
TOCTOU Mitigation
Object stores like S3 and B2 don’t support atomic conditional writes. To reduce the risk of two processes claiming a lock simultaneously:
- Exclusive lock: After writing
index/lock.exclusive, the engine immediately re-reads it and verifies holder + acquired_at still match. If another process won the race, the acquire fails.
- Shared lock: After writing
index/lock.shared/<timestamp>, the engine re-checks index/lock.exclusive. If an exclusive lock appeared concurrently, the shared lock entry is deleted and the acquire fails.
This mitigation reduces but does not eliminate races on eventually-consistent stores. In practice, the acquire-then-verify pattern makes collisions vanishingly rare.
Stale Lock Recovery
A lock is stale when its expires_at is in the past. Stale locks are ignored automatically. No manual action is needed.
If you cannot wait for the 1-minute TTL (e.g. you need to unblock a deployment immediately), use break-lock:
This unconditionally deletes index/lock.exclusive and all index/lock.shared/* entries, regardless of TTL or holder. See the break-lock reference for full usage.
Only run break-lock when you are certain no backup, restore, or prune process is actively running. Removing a lock held by an active process can corrupt the repository.
Concurrency Semantics
Because backup and restore use shared locks, they can run concurrently against the same repository without conflict. Each backup run writes its own snapshot independently. prune requires an exclusive lock and must wait for all active backups to finish (or fail fast if they’re running).
This means:
- Two simultaneous backups: ✅ Both succeed; each creates its own snapshot.
- Backup + restore simultaneously: ✅ Both succeed.
- Backup while prune is running: ❌ Backup fails immediately with a lock error.
- Prune while backup is running: ❌ Prune fails immediately with a lock error.
- Two simultaneous prunes: ❌ Second prune fails immediately.
See Also