2026-06-04 15:30:31 +01:00
2026-06-04 14:44:47 +01:00
2026-06-04 15:30:12 +01:00
2026-06-04 15:30:12 +01:00
2026-06-04 15:30:31 +01:00

disk-checker

Fast Ubuntu-friendly CLI for scanning folders, checking file sizes, hashing the first chunk of same-size files, and reporting possible duplicates plus symlinks, hard links, special files, and scan errors.

Install Rust on Ubuntu

sudo apt update
sudo apt install -y build-essential curl
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source "$HOME/.cargo/env"

Build

cargo build --release

The binary will be at:

target/release/disk-checker

Usage

Scan the current directory:

disk-checker

Scan one or more paths:

disk-checker ~/Downloads /mnt/shared

Use JSON for scripts:

disk-checker ~/Downloads --json

Hash a larger first chunk before grouping possible duplicates:

disk-checker ~/Downloads --hash-bytes 8MiB

Follow symlinks while still reporting them separately:

disk-checker ~/Downloads --follow-links

Verify possible duplicates with a full-file hash pass:

disk-checker ~/Downloads --verify-full

Review duplicate groups one by one and choose which path to keep:

disk-checker ~/Downloads --verify-full --interactive

Interactive mode requires --verify-full and is non-destructive: it writes a reviewed shell deletion plan instead of deleting files immediately.

disk-checker ~/Downloads --verify-full --interactive --delete-plan review-delete.sh

Use the fastest triage mode for huge datasets by grouping same-size files without hashing:

disk-checker /mnt/storage --size-only --min-size 100MiB --threads 32

Limit traversal depth:

disk-checker /mnt/storage --max-depth 3

Limit scanning and hashing workers:

disk-checker ~/Downloads --threads 4

Disable progress output:

disk-checker ~/Downloads --no-progress

Notes

  • By default, duplicate results are possible duplicates: same file size plus same first 1MiB BLAKE3 hash.
  • This is intentionally fast because it avoids reading whole files unless you pass --verify-full.
  • --size-only is even faster for triage, but it only means files have the same size; use it to narrow the search, not as proof.
  • Symlinks are not followed by default to avoid surprises and cycles.
  • Hard link groups are reported separately because they are multiple paths to the same inode, not extra disk copies.
  • Hidden files and gitignored files are included; this is a disk scanner, not a source-code search tool.
  • Fast mode does not read 30TB of file content. It reads metadata plus up to the hash window for same-size candidate files: for example, 30,000 candidate files at 1MiB is about 30GiB of content reads.
  • Fully verifying all 30TB in 10 minutes would require roughly 50GB/s sustained reads. --verify-full only fully reads candidate groups, but storage throughput is still the hard limit for exact verification.
  • Progress output is real and writes to stderr: traversal shows live discovered counts because total traversal work is unknown, while hashing shows determinate byte progress from actual reads. Progress is disabled automatically for --json and can be disabled with --no-progress.
Description
No description provided
Readme 80 KiB
Languages
Rust 100%