Why?

This script is designed for scenarios where you want to deduplicate incoming data against previously-seen data, but where that previously-seen data is not available to be hashed against (or you just don't want to recalculate all those hashes again). If the source data is available to you, or you want fine-grained control of the deduplication process, tools like jdupes or DupeGuru would better serve you.

Warnings

Not tested in production

This is a prototype script. Do not run in production, or against any data (incoming or archival) which you're afraid to lose.

Always backup your data. I assume no responsibility for failure to plan on your part.

Don't delete your data!

By design, any files which this script has "seen" twice will always be considered duplicates, because the script makes no distinction between "old" or "new" files, or their locations.

When ran without the --delete flag, the script is designed to ingest and hash all files, whether new or old. Because we didn't pass --delete, your files won't be touched - just hashed. This is always a safe operation.

When ran with the --delete flag, the script will also delete any files it has previously seen - even if those are your "old" files. So be careful! This is an inherently risky operation.

By design, the --delete flag is only intended to be used on incoming data, which you want to de-duplicate against a larger dataset. You should never run --delete against pre-existing data which has any importance to you, or which you don't want to lose.

You have been warned!

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
hash_files.py		hash_files.py
query_hash.py		query_hash.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Why?

Warnings

Not tested in production

Don't delete your data!

About

Uh oh!

Releases

Packages

Languages

Jefferderp/bduper

Folders and files

Latest commit

History

Repository files navigation

Why?

Warnings

Not tested in production

Don't delete your data!

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages