SUBSTRING SEARCH STUDY GUIDE
Terminology and Basics
- Substring search problem: Find an instance (or all instances) of a
query substring of length M in a text string of length N.
- Sustring must be precisely defined (no regular expressions).
- You should be able to manually use a KMP DFA, and you should be able to
manually carry out Boyers-Moore and Rabin Karp.
KMP
- How do you construct the DFA?
- How much time does it take if you re-simulate every time you have a mismatch?
It's ok if you don't fully understand the linear time construction process.
- What is the best-case running time for DFA construction and DFA simulation?
The worst-case running time?
Boyer-Moore
- What is the mismatched character heuristic? Why do we use the rightmost character?
- Why is the mismatched character heuristic strictly suboptimal?
Why do we use it then -- because the basic idea is very similar to KMP
and you'll learn it if you ever really need to.
- What is the best-case running time? The worst-case running time?
- Which inputs result in best and worst case performance?
Rabin Karp
- If we know mod(ABCDEFG, R), how do we compute mod(BCDEFGH, R) in constant time
(where A through H are arbitrary digits of a number from some alphabet of radix R)?
- What are the Las Vegas and Monte Carlo versions of Rabin-Karp?
- How would we extend Rabin-Karp to efficiently search
for any one of P possible patterns in a text of length N? How would
this technique compare to using KMP or Boyer-Moore for the same task?
Recommended Problems
C level
- Spring 2008 Final, #6
- Fall 2009 Final, #6
- Fall 2012 Final, #10
B level
- Fall 2009 Final, #5
- Fall 2009 Final, #10 [great problem!]
- Fall 2010 Final, #10
- Fall 2011 Final, #6
- Spring 2012 Final, #7
- Fall 2012 Final, #9
- Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?
A level
- Fall 2010 Final, #10
- Fall 2012 Final, #14
- Give an example of when you might prefer to use the Monte Carlo version of Rabin Karp over the Las Vegas version.
- Textbook: 5.3.22
- Textbook: 5.3.26