Skip to content

Jefferderp/bduper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Why?

This script is designed for scenarios where you want to deduplicate incoming data against previously-seen data, but where that previously-seen data is not available to be hashed against (or you just don't want to recalculate all those hashes again). If the source data is available to you, or you want fine-grained control of the deduplication process, tools like jdupes or DupeGuru would better serve you.

Warnings

Not tested in production

This is a prototype script. Do not run in production, or against any data (incoming or archival) which you're afraid to lose.

Always backup your data. I assume no responsibility for failure to plan on your part.

Don't delete your data!

By design, any files which this script has "seen" twice will always be considered duplicates, because the script makes no distinction between "old" or "new" files, or their locations.

When ran without the --delete flag, the script is designed to ingest and hash all files, whether new or old. Because we didn't pass --delete, your files won't be touched - just hashed. This is always a safe operation.

When ran with the --delete flag, the script will also delete any files it has previously seen - even if those are your "old" files. So be careful! This is an inherently risky operation.

By design, the --delete flag is only intended to be used on incoming data, which you want to de-duplicate against a larger dataset. You should never run --delete against pre-existing data which has any importance to you, or which you don't want to lose.

You have been warned!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages