Bug 10891 - grep extremely slow
Summary: grep extremely slow
Status: NEW
Alias: None
Product: Busybox
Classification: Unclassified
Component: Other (show other bugs)
Version: 1.27.x
Hardware: All Linux
: P5 enhancement
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-23 14:00 UTC by tim.ruehsen
Modified: 2018-03-23 14:00 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description tim.ruehsen 2018-03-23 14:00:08 UTC
Busybox's grep performs very slow within a 'make syntax-check' run.

I tracked it down to a grep which takes 0.28s with GNU grep and 47s with busybox's grep.

There are 770 patterns, all of the form
^ *# *(define|undef)  *AI_ADDRCONFIG\>

What changes from pattern to pattern is only the 'AI_ADDRCONFIG' part.

The number of files doesn't matter. When concatenating all 5700 files into one file (~2M, mostly C sources) the performance stays as high as with all 5700 files.

The grep command line is like 'cat patterns|grep -E -f - file(s)'.

I assume a very simple optimization in GNU grep: if all patterns begin with the same sequence and the rest is a simple string, then reduce to one pattern + a list of memcmp() calls. So the extra code wouldn't be too much I guess.

Would be lovely to see this built in.