number_of_matches_per_file []

Task

Given a bunch of files in a directory, count the number of times a word occurs in each file. For example, given

% tail -n +1 *
==> junk1.txt <==
foo
bar
foo bar foo
bar foo bar

==> junk2.txt <==
foo
bar
foo bar foo
bar foo bar
foo foo foo

count the number of occurrences of 'foo' in each file. The expected answer is

junk1.txt:4
junk2.txt:7

tags | Number of matches per file

sample code demoes | cat with filename

Solution using git grep and awk

If it is a git repository

git grep -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'

If it is not a git repository

git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'

For example

% git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
junk2.txt:7
junk1.txt:4

How it works

The git grep command gives

% git grep --no-index -o foo
junk1.txt:foo
junk1.txt:foo
junk1.txt:foo
junk1.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo

The awk command counts the number of hits per file.

References

https://stackoverflow.com/questions/39945363/frequency-count-for-file-column-in-bash - count frequencies using awk

Solution using grep and awk

grep -ro foo * | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'

Useful if git is not available.

Solution using find, grep and wc

find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh

For example

% find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh
junk1.txt:4
junk2.txt:7

How it works

To see how it works, run the command without piping the output to sh

% find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";'     
echo "junk1.txt:$(grep -o "foo" junk1.txt | wc -l)";echo "junk2.txt:$(grep -o "foo" junk2.txt | wc -l)";

So we are just building up a big command that would run “grep -o” on each file and then format the output.

find *          - find the files
-printf ''      - format and print everything between the single-quotes.
%p in -printf   - will be replaced by the filename in find's output
grep -o         - print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
| sh            - execute the command

Note: You have to use “grep -o” and not “grep -c”. If a string occurs multiple times in a line, “grep -o” matches each of them separately. But “grep -c” counts them together. For example

% cat junk2.txt 
foo
bar
foo bar foo
bar foo bar
foo foo foo

% grep foo junk2.txt
foo
foo bar foo
bar foo bar
foo foo foo

% grep -o foo junk2.txt 
foo
foo
foo
foo
foo
foo
foo

% grep -c foo junk2.txt
4

% grep -o foo junk2.txt| wc -l
7

References

https://newbedev.com/counting-all-occurrences-of-a-string-within-all-files-in-a-folder - where I came across this solution.

Table of Contents

Task

Solution using git grep and awk

How it works

References

tags

Solution using grep and awk

Solution using find, grep and wc

How it works

References