How To Remove Redundant Data On Linux With Fdupes
Nobody likes duplicate files. They take up unnecessary space on a system and get in the way. Thankfully, on Linux, there’s a way to remove redundant data and clean up duplicate files, using Fdupes.
Install Fdupes
The Fdupes tool is one of the best command-line de-duplication tools on the Linux platform. When run, it can scan any directory for duplicate files, sniff out their exact location and turn the duplicates into symlinks — both hard links and soft links. Best of all, by turning duplicates into system links, it reduces space, while keeping the files intact in the same location.
Fdupes is easy to install and has a lot of support within the majority of Linux distributions out there. In fact, even FreeBSD has the software available. To get the software working, launch a terminal and enter the commands that correspond to your Linux operating system.
Ubuntu
sudo apt install fdupes
Debian
sudo apt-get install fdupes
Arch Linux
Fdupes is on Arch Linux, via the “Community” repository. Sadly, the “Community” repo isn’t set up by default. To install this program, you’ll first need to edit the Pacman.conf file.
To edit the configuration file, open it in the Nano text editor.
sudo nano /etc/pacman.conf
In the configuration file, remove “#” from in front of everything “Community” related. Keep in mind that every “#” must be gone, or the repo will not work. When the edits are done, save it with Ctrl + O and exit with Ctrl + X.
Sync the new community repo with Pacman.
sudo pacman -Syy
Now that “Community” software source has successfully synced, Arch Linux has full access to it. Finish up the process by installing the Fdupes application through the package manager.
sudo pacman -S fdupes
Fedora
sudo dnf install fdupes
OpenSUSE
sudo zypper install fdupes
Scan For Duplicates
Before Fdupes can remove redundant files, it needs to know where they are. To find the files, you’ll need to make use of the r switch. With the r switch, you’ll be able to search in every folder on your Linux PC, along with every subfolder, making finding duplicates much, much faster.
Follow the instructions below to learn how to find and remove duplicate files in several locations on your Linux PC.
Duplicates In Home Folder
One of the main places users stores files in is /home/. The reason this folder holds tons of files is that everything that a user does on Linux is put here. As a result, files build up over time and often duplicates build up. To find these duplicates, open up a terminal and point fdupes to your home folder.
fdupes -r ~/
or, to scan another user on your PC, not currently logged in, do:
fdupes -r /home/username/
After running the scan, the tool will return an entire list of duplicates it finds in the home directory. To save this information, pipe the output to a file in the Documents folder.
fdupes -r ~/ >> ~/Documents/fdupes-scan-home.txt
or
fdupes -r /home/username/ >> ~/Documents/fdupes-scan-home-user.txt
Duplicates In Root File System
Fdupes has the ability to scan any location, and not just the home folder. If you’re trying to find duplicate files on the root file system of your Linux PC, here’s what to do.
In a terminal, change the shell from a normal user to the root account. Changing from a traditional account to Root will allow the Fdupes app to scan in locations that are off limits to a normal setup.
sudo -s
or
su -
As root, scan the root file system using Fdupes.
fdupes -r /
Alternatively, scan a specific location, rather than the entire Root system with:
fdupes -r /location/on/your/pc
Need to export the scanning results to a file for later? Run this command.
fdupes -r / >> /home/username/Documents/fdupes-scan.txt
or
fdupes -r /location/on/your/pc >> /home/username/Documents/fdupes-scan.txt
Remove Redundant Data
Scanning for duplicate files is a critical part of removing the redundant data. The next step is to deal with the results, and remove the redundancies. Luckily, getting rid of redundant data is incredibly quick. Best of all, nothing actually needs to be deleted, as Fdupes supports swapping actual files with symlinks.
There are two types of deduplications that Fdupes supports: hard links and soft links. For best results, we recommend going with hard links, as they’re indistinguishable from actual files. However, soft symlinks also work. To deduplicate, do the following in a terminal.
Note: Do not replace duplicate data system-wide unless you understand the risks that can occur!
Hard Link Replace
fdupes -rH /home/username/
or, for system-wide duplicates:
sudo -s fdupes -rH /root/file/location
Soft Link Replace
fdupes -rS /home/username/
or, for system-wide:
fdupes -rS /root/file/location