DEV Community

Cover image for Mastering Linux Duplicate Filtering
Labby for LabEx

Posted on

Mastering Linux Duplicate Filtering

Introduction

Welcome to the Linux Duplicate Filtering Colosseum! In this grand arena, where the order of data is as revered as the order of battle, you are an aspiring hero's aide, tasked to master the ancient scrolls of uniq. The hero stands ready to face daunting foes, and your wizardry in file manipulation and duplicate filtering will prove critical for their victory.

The goal is to equip our champion with the knowledge to wade through volumes of repetitive information and emerge with only the most valuable and unique records. As forge and anvil are to the swordsmith, so are your command-line skills in transforming a cluttered log into a clear record of events. Let your fingers dance upon the keyboard and invoke the uniq spell, for only with this power can our hero claim triumph in the Data Distillation Duel!

Understanding the uniq Fundamentals

In this step, you will familiarize yourself with the uniq command, which is crucial for filtering duplicate lines in sequential data. You should start by creating a text file to practice on. Let's call this file duel_log.txt. In this file, you will input several lines, some of which are duplicates.

First, create the file with the following command in the ~/project directory and add some content:

echo -e "sword\nsword\nshield\npotion\npotion\nshield" > ~/project/duel_log.txt
Enter fullscreen mode Exit fullscreen mode

Now, use the uniq command to filter out the duplicates:

uniq ~/project/duel_log.txt
Enter fullscreen mode Exit fullscreen mode

This command will display the content without the adjacent duplicate lines. Your expected result should be:

sword
shield
potion
shield
Enter fullscreen mode Exit fullscreen mode

Notice that 'shield' is both at the end and the beginning of two different sets of duplicates. uniq only removes duplicates when they are adjacent (next to each other), which is why 'shield' appears twice.

Sorting and Unique Filtering

After mastering the basic use of uniq, your next task is to ensure that all duplicates are filtered out, not just the adjacent ones. For that, you will need to sort the list prior to using uniq.

Create a new file called sorted_duel_log.txt where we will store the results:

touch ~/project/sorted_duel_log.txt
Enter fullscreen mode Exit fullscreen mode

Now, sort the duel_log.txt and pipe the output to uniq, redirecting it to sorted_duel_log.txt:

sort ~/project/duel_log.txt | uniq > ~/project/sorted_duel_log.txt
Enter fullscreen mode Exit fullscreen mode

The sorted and unique content of sorted_duel_log.txt should be:

potion
shield
sword
Enter fullscreen mode Exit fullscreen mode

Here, you have eliminated all duplicates, ensuring our hero has a log that is both concise and complete.

Summary

In this lab, we embarked on a quest to master the uniq and sort commands, thereby contributing to our hero's victory in the Linux Duplicate Filtering Colosseum. You started by creating and manipulating a sample log file, learning how uniq removes adjacent duplicate lines. Moving forward, you combined sort with uniq to ensure all duplicates were removed, regardless of their position.

Through these steps, you have honed your command-line skills and become an invaluable wizard who can transform overwhelming data into a crystal-clear record of unique events, a truly vital asset in any data-heavy battlefield.

May your new-found powers aid our hero in many more adventures, and may your journey in the Linux realm be long and prosperous!


🚀 Practice Now: Linux Duplicate Filtering


Want to Learn More?

Top comments (0)