1. Home
  2. Linux
  3. Remove redundant data on linux fdupes

How To Remove Redundant Data On Linux With Fdupes

Nobody likes duplicate files. They take up unnecessary space on a system and get in the way. Thankfully, on Linux, there’s a way to remove redundant data and clean up duplicate files, using Fdupes.

Install Fdupes

The Fdupes tool is one of the best command-line de-duplication tools on the Linux platform. When run, it can scan any directory for duplicate files, sniff out their exact location and turn the duplicates into symlinks — both hard links and soft links. Best of all, by turning duplicates into system links, it reduces space, while keeping the files intact in the same location.

Fdupes is easy to install and has a lot of support within the majority of Linux distributions out there. In fact, even FreeBSD has the software available. To get the software working, launch a terminal and enter the commands that correspond to your Linux operating system.

Ubuntu

sudo apt install fdupes

Debian

sudo apt-get install fdupes

Arch Linux

Fdupes is on Arch Linux, via the “Community” repository. Sadly, the “Community” repo isn’t set up by default. To install this program, you’ll first need to edit the Pacman.conf file.

To edit the configuration file, open it in the Nano text editor.

sudo nano /etc/pacman.conf

In the configuration file, remove “#” from in front of everything “Community” related. Keep in mind that every “#” must be gone, or the repo will not work. When the edits are done, save it with Ctrl + O and exit with Ctrl + X.

Sync the new community repo with Pacman.

sudo pacman -Syy

Now that “Community” software source has successfully synced, Arch Linux has full access to it. Finish up the process by installing the Fdupes application through the package manager.

sudo pacman -S fdupes

Fedora

sudo dnf install fdupes

OpenSUSE

sudo zypper install fdupes

Scan For Duplicates

Before Fdupes can remove redundant files, it needs to know where they are. To find the files, you’ll need to make use of the r switch. With the r switch, you’ll be able to search in every folder on your Linux PC, along with every subfolder, making finding duplicates much, much faster.

Follow the instructions below to learn how to find and remove duplicate files in several locations on your Linux PC.

Duplicates In Home Folder

One of the main places users stores files in is /home/. The reason this folder holds tons of files is that everything that a user does on Linux is put here. As a result, files build up over time and often duplicates build up. To find these duplicates, open up a terminal and point fdupes to your home folder.

fdupes -r ~/

or, to scan another user on your PC, not currently logged in, do:

fdupes -r /home/username/

After running the scan, the tool will return an entire list of duplicates it finds in the home directory. To save this information, pipe the output to a file in the Documents folder.

fdupes -r ~/ >> ~/Documents/fdupes-scan-home.txt

or

fdupes -r /home/username/ >> ~/Documents/fdupes-scan-home-user.txt

Duplicates In Root File System

Fdupes has the ability to scan any location, and not just the home folder. If you’re trying to find duplicate files on the root file system of your Linux PC, here’s what to do.

In a terminal, change the shell from a normal user to the root account. Changing from a traditional account to Root will allow the Fdupes app to scan in locations that are off limits to a normal setup.

sudo -s

or

su -

As root, scan the root file system using Fdupes.

fdupes -r /

Alternatively, scan a specific location, rather than the entire Root system with:

fdupes -r /location/on/your/pc

Need to export the scanning results to a file for later? Run this command.

fdupes -r / >> /home/username/Documents/fdupes-scan.txt

or

fdupes -r /location/on/your/pc >> /home/username/Documents/fdupes-scan.txt

Remove Redundant Data

Scanning for duplicate files is a critical part of removing the redundant data. The next step is to deal with the results, and remove the redundancies. Luckily, getting rid of redundant data is incredibly quick. Best of all, nothing actually needs to be deleted, as Fdupes supports swapping actual files with symlinks.

There are two types of deduplications that Fdupes supports: hard links and soft links. For best results, we recommend going with hard links, as they’re indistinguishable from actual files. However, soft symlinks also work.  To deduplicate, do the following in a terminal.

Note: Do not replace duplicate data system-wide unless you understand the risks that can occur!

Hard Link Replace

fdupes -rH /home/username/

or, for system-wide duplicates:

sudo -s
fdupes -rH /root/file/location

Soft Link Replace

fdupes -rS /home/username/

or, for system-wide:

fdupes -rS /root/file/location

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.