User Tools

Site Tools


number_of_matches_per_file

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
Last revisionBoth sides next revision
number_of_matches_per_file [2021/10/31 17:49] – [tags] adminnumber_of_matches_per_file [2021/10/31 21:17] – [How it works] admin
Line 1: Line 1:
 ===== Task ===== ===== Task =====
-Given+Given a bunch of files in a directory, count the number of times a word occurs in each file. For example, given 
 <code> <code>
 % tail -n +1 * % tail -n +1 *
Line 17: Line 18:
 </code> </code>
  
-count the number of times the word foo occurred in each file. The expected answer is+count the number of occurrences of 'fooin each file. The expected answer is
 <code> <code>
 junk1.txt:4 junk1.txt:4
Line 24: Line 25:
  
 tags | Number of matches per file tags | Number of matches per file
 +
 +sample code demoes | cat with filename
 ===== Solution using git grep and awk ===== ===== Solution using git grep and awk =====
 +If it is a git repository
 +<code>
 +git grep -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
 +</code>
 +
 +If it is not a git repository
 <code> <code>
 git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}' git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
Line 55: Line 64:
  
 ==== References ==== ==== References ====
-  * https://stackoverflow.com/questions/39945363/frequency-count-for-file-column-in-bash - frequency counting using awk+  * https://stackoverflow.com/questions/39945363/frequency-count-for-file-column-in-bash - count frequencies using awk
  
 ==== tags ==== ==== tags ====
-awk frequency count, awk count breakdown, uniq reverse output, "git grep" count matches, count "grep -o" +awk frequency count, awk count breakdown, uniq reverse output, "git grep" count matches, count "grep -o", "grep -o" counts, "grep -o" summarize 
-===== Solution using find and grep =====+ 
 +===== Solution using grep and awk ===== 
 +<code> 
 +grep -ro "came across" * | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}' 
 +</code> 
 + 
 +Useful if git is not available. 
 +===== Solution using find, grep and wc =====
 <code> <code>
 find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh
Line 88: Line 104:
 </code> </code>
  
-Note: You have to use "grep -o" instead of "grep -c". If a string occurs multiple times in a line, "grep -o" matches each of them separately. But "grep -c" counts them together. For example+Note: You have to use "grep -o" and not "grep -c". If a string occurs multiple times in a line, "grep -o" matches each of them separately. But "grep -c" counts them together. For example
  
 <code> <code>
Line 120: Line 136:
 </code> </code>
  
-==== Reference ====+==== References ====
   * https://newbedev.com/counting-all-occurrences-of-a-string-within-all-files-in-a-folder - where I came across this solution.   * https://newbedev.com/counting-all-occurrences-of-a-string-within-all-files-in-a-folder - where I came across this solution.
  
number_of_matches_per_file.txt · Last modified: 2021/11/04 14:59 by admin