Count unique grep results

I’m working through a big refactor in our code base and I wanted to quickly find a count of all the unique matches to a string search expression. Here’s an example of how to do this on a unix command-line with the help of grep, cut, sort, and uniq.

grep -r "import javax.xml.bind.annotation" . | \
cut -f2 -d \: | \
sort | uniq -c | sort -nr

Let’s break that down. The first part says: recursively search the current directory and any subdirectories for lines containing “import javax.xml.bind.annotation”. This part alone will result in something like:

File1.java: javax.xml.bind.annotation.XmlElement
File1.java: javax.xml.bind.annotation.XmlTransient
File2.java: javax.xml.bind.annotation.XmlElement

Now we want to get rid of everything before the “:” so we use cut and specify we want to delimit by “:” and select the second part for each line. Now we have something like this:

javax.xml.bind.annotation.XmlElement
javax.xml.bind.annotation.XmlTransient
javax.xml.bind.annotation.XmlElement

Now we want to get the count of each unique match so we can first do a sort which will group all the like elements together. Then pass it to uniq to take the sorted results and squash duplicates to a single line with a count. Then finally to sort the aggregated results we pass it to sort once more this time with -n which tells sort to do a numeric (not lex) sort and -r which makes the order descending. So our final output looks like:

2  javax.xml.bind.annotation.XmlElement
1  javax.xml.bind.annotation.XmlTransient
Short URL: http://goo.gl/IGlyF

One thought on “Count unique grep results”

Leave a Reply

Your email address will not be published. Required fields are marked *