[Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Bug regex/19348] New: re_search is incredibly slow when processing '$' on long lines

macro@linux-mips.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19348

            Bug ID: 19348
           Summary: re_search is incredibly slow when processing '$' on
                    long lines
           Product: glibc
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: regex
          Assignee: unassigned at sourceware dot org
          Reporter: alex_y_xu at yahoo dot ca
                CC: drepper.fsp at gmail dot com
  Target Milestone: ---

$ echo {1..5000000} > file # adjust based on CPU speed
    $ time sed -e 's/$/stuff/' file >/dev/null # logical way to append to lines
    sed -e 's/$/stuff/' file > /dev/null  2.91s user 0.09s system 99% cpu 3.007
total
    $ time sed -e 's/.*/&stuff/' file >/dev/null
    sed -e 's/.*/&stuff/' file > /dev/null  1.62s user 0.34s system 99% cpu
1.972 total

musl via busybox sed was tested to be 2x faster in the first case than in the
second.

intuitively, this does not make sense. .* should be slower because it needs to
match the entire string whereas $ can skip to the end of the line (since sed
must already find the new line in order to run the commands).

however, glibc spends an inordinate amount of time inside of
check_halt_state_context, re_state_reconstruct, and re_string_context_at,
according to callgrind.

I am unsure whether this qualifies as a glibc bug or how to fix it, but I think
it is useful to have on the record.

--
You are receiving this mail because:
You are on the CC list for the bug.
Reply | Threaded
Open this post in threaded view
|

[Bug regex/19348] re_search matches $ much slower than .*

macro@linux-mips.org
https://sourceware.org/bugzilla/show_bug.cgi?id=19348

alex_y_xu at yahoo dot ca changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|re_search is incredibly     |re_search matches $ much
                   |slow when processing '$' on |slower than .*
                   |long lines                  |

--
You are receiving this mail because:
You are on the CC list for the bug.