User Tools

Site Tools


number_of_matches_per_file

This is an old revision of the document!


Task

Given

% tail -n +1 *
==> junk1.txt <==
foo
bar
foo bar foo
bar foo bar

==> junk2.txt <==
foo
bar
foo bar foo
bar foo bar
foo foo foo

count the number of times the word foo occurred in each file. The expected answer is

junk1.txt:4
junk2.txt:7

tags | Number of matches per file

Solution using git grep and awk

git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'

For example

% git grep --no-index -o foo  | awk -F':' '{freq[$1]++} END{for (file in freq) print file ":" freq[file]}'
junk2.txt:7
junk1.txt:4

How it works

The git grep command gives

% git grep --no-index -o foo
junk1.txt:foo
junk1.txt:foo
junk1.txt:foo
junk1.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo
junk2.txt:foo

The awk command counts the number of hits per file.

References

tags

awk frequency count, awk count breakdown, uniq reverse output, “git grep” count matches

Solution using find and grep

find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh

For example

% find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";' | sh
junk1.txt:4
junk2.txt:7

How it works

To see how it works, run the command without piping the output to sh

% find * -printf 'echo "%p:$(grep -o "foo" %p | wc -l)";'     
echo "junk1.txt:$(grep -o "foo" junk1.txt | wc -l)";echo "junk2.txt:$(grep -o "foo" junk2.txt | wc -l)";

So we are just building up a big command that would run “grep -o” on each file and then format the output.

find *          - find the files
-printf ''      - format and print everything between the single-quotes.
%p in -printf   - will be replaced by the filename in find's output
grep -o         - print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.
| sh            - execute the command

Note: You have to use “grep -o” instead of “grep -c”. If a string occurs multiple times in a line, “grep -o” matches each of them separately. But “grep -c” counts them together. For example

% cat junk2.txt 
foo
bar
foo bar foo
bar foo bar
foo foo foo

% grep foo junk2.txt
foo
foo bar foo
bar foo bar
foo foo foo

% grep -o foo junk2.txt 
foo
foo
foo
foo
foo
foo
foo

% grep -c foo junk2.txt
4

% grep -o foo junk2.txt| wc -l
7

Reference

number_of_matches_per_file.1635702557.txt.gz · Last modified: 2021/10/31 17:49 by admin