Bug 10891

Summary: grep extremely slow
Product: Busybox Reporter: tim.ruehsen
Component: OtherAssignee: unassigned
Status: NEW ---    
Severity: enhancement CC: busybox-cvs
Priority: P5    
Version: 1.27.x   
Target Milestone: ---   
Hardware: All   
OS: Linux   
Host: Target:
Build:

Description tim.ruehsen 2018-03-23 14:00:08 UTC
Busybox's grep performs very slow within a 'make syntax-check' run.

I tracked it down to a grep which takes 0.28s with GNU grep and 47s with busybox's grep.

There are 770 patterns, all of the form
^ *# *(define|undef)  *AI_ADDRCONFIG\>

What changes from pattern to pattern is only the 'AI_ADDRCONFIG' part.

The number of files doesn't matter. When concatenating all 5700 files into one file (~2M, mostly C sources) the performance stays as high as with all 5700 files.

The grep command line is like 'cat patterns|grep -E -f - file(s)'.

I assume a very simple optimization in GNU grep: if all patterns begin with the same sequence and the rest is a simple string, then reduce to one pattern + a list of memcmp() calls. So the extra code wouldn't be too much I guess.

Would be lovely to see this built in.