User Tools

Site Tools


number_of_matches_per_file

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
number_of_matches_per_file [2021/10/31 17:48] – [tags] adminnumber_of_matches_per_file [2021/11/04 14:59] (current) – [Solution using grep and awk] admin
Line 1: Line 1:
 ===== Task ===== ===== Task =====
-Given+Given a bunch of files in a directory, count the number of times a word occurs in each file. For example, given 
 <code> <code>
 % tail -n +1 * % tail -n +1 *
Line 17: Line 18:
 </code> </code>
  
-count the number of times the word foo occurred in each file. The expected answer is+count the number of occurrences of 'fooin each file. The expected answer is
 <code> <code>
 junk1.txt:4 junk1.txt:4
Line 24: Line 25:
  
 tags | Number of matches per file tags | Number of matches per file
 +
 +sample code demoes | cat with filename
 ===== Solution using git grep and awk ===== ===== Solution using git grep and awk =====
 +If it is a git repository
 +<code>
 +git grep -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
 +</code>
 +
 +If it is not a git repository
 <code> <code>
 git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}' git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
Line 55: Line 64:
  
 ==== References ==== ==== References ====
-  * https://stackoverflow.com/questions/39945363/frequency-count-for-file-column-in-bash - frequency counting using awk+  * https://stackoverflow.com/questions/39945363/frequency-count-for-file-column-in-bash - count frequencies using awk
  
 ==== tags ==== ==== tags ====
-awk frequency count, awk count breakdown, uniq reverse output +awk frequency count, awk count breakdown, uniq reverse output, "git grep" count matches, count "grep -o", "grep -o" counts, "grep -o" summarize 
-===== Solution using find and grep =====+ 
 +===== Solution using grep and awk ===== 
 +<code> 
 +grep -ro foo * | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}' 
 +</code> 
 + 
 +Useful if git is not available. 
 +===== Solution using find, grep and wc =====
 <code> <code>
 find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh
Line 88: Line 104:
 </code> </code>
  
-Note: You have to use "grep -o" instead of "grep -c". If a string occurs multiple times in a line, "grep -o" matches each of them separately. But "grep -c" counts them together. For example+Note: You have to use "grep -o" and not "grep -c". If a string occurs multiple times in a line, "grep -o" matches each of them separately. But "grep -c" counts them together. For example
  
 <code> <code>
Line 120: Line 136:
 </code> </code>
  
-==== Reference ====+==== References ====
   * https://newbedev.com/counting-all-occurrences-of-a-string-within-all-files-in-a-folder - where I came across this solution.   * https://newbedev.com/counting-all-occurrences-of-a-string-within-all-files-in-a-folder - where I came across this solution.
  
number_of_matches_per_file.1635702537.txt.gz · Last modified: 2021/10/31 17:48 by admin