Document resolver and progress modes

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
2026-06-04 15:30:31 +01:00
parent 4dafcac9dc
commit ab14a9d891
1 changed files with 34 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -61,16 +61,50 @@ Verify possible duplicates with a full-file hash pass:
 disk-checker ~/Downloads --verify-full
 ```
 Review duplicate groups one by one and choose which path to keep:
 ```bash
 disk-checker ~/Downloads --verify-full --interactive
 ```
 Interactive mode requires `--verify-full` and is non-destructive: it writes a reviewed shell deletion plan instead of deleting files immediately.
 ```bash
 disk-checker ~/Downloads --verify-full --interactive --delete-plan review-delete.sh
 ```
 Use the fastest triage mode for huge datasets by grouping same-size files without hashing:
 ```bash
 disk-checker /mnt/storage --size-only --min-size 100MiB --threads 32
 ```
 Limit traversal depth:
 ```bash
 disk-checker /mnt/storage --max-depth 3
 ```
 Limit scanning and hashing workers:
 ```bash
 disk-checker ~/Downloads --threads 4
 ```
 Disable progress output:
 ```bash
 disk-checker ~/Downloads --no-progress
 ```
 ## Notes
 - By default, duplicate results are **possible duplicates**: same file size plus same first `1MiB` BLAKE3 hash.
 - This is intentionally fast because it avoids reading whole files unless you pass `--verify-full`.
 - `--size-only` is even faster for triage, but it only means files have the same size; use it to narrow the search, not as proof.
 - Symlinks are not followed by default to avoid surprises and cycles.
 - Hard link groups are reported separately because they are multiple paths to the same inode, not extra disk copies.
 - Hidden files and gitignored files are included; this is a disk scanner, not a source-code search tool.
 - Fast mode does **not** read 30TB of file content. It reads metadata plus up to the hash window for same-size candidate files: for example, 30,000 candidate files at `1MiB` is about 30GiB of content reads.
 - Fully verifying all 30TB in 10 minutes would require roughly 50GB/s sustained reads. `--verify-full` only fully reads candidate groups, but storage throughput is still the hard limit for exact verification.
 - Progress output is real and writes to stderr: traversal shows live discovered counts because total traversal work is unknown, while hashing shows determinate byte progress from actual reads. Progress is disabled automatically for `--json` and can be disabled with `--no-progress`.