Files

repi ab14a9d891 Document resolver and progress modes

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

2026-06-04 15:30:31 +01:00

3.1 KiB

Raw Blame History

disk-checker

Fast Ubuntu-friendly CLI for scanning folders, checking file sizes, hashing the first chunk of same-size files, and reporting possible duplicates plus symlinks, hard links, special files, and scan errors.

Install Rust on Ubuntu

sudo apt update
sudo apt install -y build-essential curl
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

Build

cargo build --release

The binary will be at:

target/release/disk-checker

Usage

Scan the current directory:

disk-checker

Scan one or more paths:

disk-checker ~/Downloads /mnt/shared

Use JSON for scripts:

disk-checker ~/Downloads --json

Hash a larger first chunk before grouping possible duplicates:

disk-checker ~/Downloads --hash-bytes 8MiB

Follow symlinks while still reporting them separately:

disk-checker ~/Downloads --follow-links

Verify possible duplicates with a full-file hash pass:

disk-checker ~/Downloads --verify-full

Review duplicate groups one by one and choose which path to keep:

disk-checker ~/Downloads --verify-full --interactive

Interactive mode requires --verify-full and is non-destructive: it writes a reviewed shell deletion plan instead of deleting files immediately.

disk-checker ~/Downloads --verify-full --interactive --delete-plan review-delete.sh

Use the fastest triage mode for huge datasets by grouping same-size files without hashing:

disk-checker /mnt/storage --size-only --min-size 100MiB --threads 32

Limit traversal depth:

disk-checker /mnt/storage --max-depth 3

Limit scanning and hashing workers:

disk-checker ~/Downloads --threads 4

Disable progress output:

disk-checker ~/Downloads --no-progress

Notes

By default, duplicate results are possible duplicates: same file size plus same first 1MiB BLAKE3 hash.
This is intentionally fast because it avoids reading whole files unless you pass --verify-full.
--size-only is even faster for triage, but it only means files have the same size; use it to narrow the search, not as proof.
Symlinks are not followed by default to avoid surprises and cycles.
Hard link groups are reported separately because they are multiple paths to the same inode, not extra disk copies.
Hidden files and gitignored files are included; this is a disk scanner, not a source-code search tool.
Fast mode does not read 30TB of file content. It reads metadata plus up to the hash window for same-size candidate files: for example, 30,000 candidate files at 1MiB is about 30GiB of content reads.
Fully verifying all 30TB in 10 minutes would require roughly 50GB/s sustained reads. --verify-full only fully reads candidate groups, but storage throughput is still the hard limit for exact verification.
Progress output is real and writes to stderr: traversal shows live discovered counts because total traversal work is unknown, while hashing shows determinate byte progress from actual reads. Progress is disabled automatically for --json and can be disabled with --no-progress.