Linux command of the day [uniq]

3 views
Skip to first unread message

Manoj Srivastava

unread,
Jul 24, 2008, 3:38:06 PM7/24/08
to nlug...@googlegroups.com
Hi,

Now that we have touched sort, and looking at the sort -u
option, here is a somewhat more featureful uniq utility.

manoj

uniq -- report or omit repeated lines

Summary :

It prints the unique lines in a sorted file. If input is NOT in
sorted order then, only adjacent duplicate lines are discarded.

Example:

Create 'myfile' with 2 cols of digits for testing.

$ uniq myfile -- Print only unique lines.

$ uniq -i myfile -- Ignore case when comparing.

$ uniq -u myfile -- Print only lines, Which is not duplicated.

$ uniq -c myfile -- Print all unique lines with no .of occurrence.

$ uniq -d myfile -- Print only duplicate lines. If a line duplicated
then, only one line will be printed.

$ uniq -D myfile -- Print all duplicate lines.

$ uniq -w5 myfile -- Use only first 5 Chars for checking the uniqueness
of the line.

$ uniq -f2 myfile -- Don't compare up-to 2nd field.

$ uniq -s2 myfile -- Don't compare up-to 2nd char.

$ sort myfile | uniq > output -- Sort myfile & store the uniq output.

Read : man uniq

--
His heart was yours from the first moment that you met.
Manoj Srivastava <sriv...@debian.org> <http://www.debian.org/~srivasta/>
1024D/BF24424C print 4966 F272 D093 B493 410B 924B 21BA DABB BF24 424C

Steven S. Critchfield

unread,
Jul 24, 2008, 4:01:36 PM7/24/08
to nlug...@googlegroups.com

----- "Manoj Srivastava" <sriv...@debian.org> wrote:
> $ sort myfile | uniq > output -- Sort myfile & store the uniq output.

Or of fun and regular use for me,

sort myfile | uniq -c | sort -n > output

Sort file, uniq -c now can count entries, and then resort based on the numeric value uniq counted.

When profiling or similar problems, the above example helps focus effort on the high or low numbers, whichever is necessary for you.


--
Steven Critchfield cri...@basesys.com

Brandon Valentine

unread,
Jul 29, 2008, 2:56:59 PM7/29/08
to nlug...@googlegroups.com
On Thu, Jul 24, 2008 at 2:38 PM, Manoj Srivastava <sriv...@debian.org> wrote:
> It prints the unique lines in a sorted file. If input is NOT in
> sorted order then, only adjacent duplicate lines are discarded.

Today I found myself working with sort and uniq several times and
thought I'd do my part to post a followup to Manoj's summary of the
commands. sort and uniq, perhaps most interestingly, can be used as
tools to do basic set theory operations. I often find myself with
multiple text files from which I need to find the union, intersection,
symmetric difference, or relative complements. Ever wondered how you
can compare those two CSV files someone exported and sent to you?
Excel just not cutting it here? You need sort and uniq. Here's some
quick examples of how to use these tools to get at that data. There
is another command line utility called comm(1) that can be used to
accomplish some of these operations as well, but learning and
remembering the sort and uniq method will take you a lot further.

To get the union of two sets (members in either A or B):
% sort -u setA setB

To get the intersection (members in both A and B):
% sort setA setB | uniq -d

To get the symmetric difference (members in A or B but not both):
% sort setA setB | uniq -u

To get the relative complement (members in A but not B):
% sort setA setB setB | uniq -u

If you're wondering what this is all about perhaps someone can jump in
with a link to a good basic introduction to set theory. A quick
Google didn't turn up much and I've gotta run.

Cheers,

--
Brandon D. Valentine
http://www.brandonvalentine.com

Reply all
Reply to author
Forward
0 new messages