I’m working through a big refactor in our code base and I wanted to quickly find a count of all the unique matches to a string search expression. Here’s an example of how to do this on a unix command-line with the help of grep, cut, sort, and uniq.
grep -r "import javax.xml.bind.annotation" . | \ cut -f2 -d \: | \ sort | uniq -c | sort -nr
Let’s break that down. The first part says: recursively search the current directory and any subdirectories for lines containing “import javax.xml.bind.annotation”. This part alone will result in something like:
File1.java: javax.xml.bind.annotation.XmlElement File1.java: javax.xml.bind.annotation.XmlTransient File2.java: javax.xml.bind.annotation.XmlElement
Now we want to get rid of everything before the “:” so we use cut and specify we want to delimit by “:” and select the second part for each line. Now we have something like this:
javax.xml.bind.annotation.XmlElement javax.xml.bind.annotation.XmlTransient javax.xml.bind.annotation.XmlElement
Now we want to get the count of each unique match so we can first do a sort which will group all the like elements together. Then pass it to uniq to take the sorted results and squash duplicates to a single line with a count. Then finally to sort the aggregated results we pass it to sort once more this time with -n which tells sort to do a numeric (not lex) sort and -r which makes the order descending. So our final output looks like:
2 javax.xml.bind.annotation.XmlElement 1 javax.xml.bind.annotation.XmlTransient