SUBSTRING SEARCH STUDY GUIDE
Terminology and Basics
- Substring search problem: Find an instance (or all instances) of a
query substring of length M in a text string of length N.
- Sustring must be precisely defined (no regular expressions).
- You should be able to manually use a KMP DFA, and you should be able to
manually carry out Boyers-Moore and Rabin Karp.
KMP
- How do you construct the DFA?
- How much time does it take if you re-simulate every time you have a mismatch?
It's ok if you don't fully understand the linear time construction process.
- What is the best-case running time for DFA construction and DFA simulation?
The worst-case running time?
Boyer-Moore
- What is the mismatched character heuristic? Why do we use the rightmost character?
- Why is the mismatched character heuristic strictly suboptimal?
Why do we use it then -- because the basic idea is very similar to KMP
and you'll learn it if you ever really need to.
- What is the best-case running time? The worst-case running time?
- Which inputs result in best and worst case performance?
Recommended Problems
C level
- Fall 2012, #10 (Boyer-Moore)
-
(a) Given the following KMP DFA, give the string that this DFA searches for
j |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
A |
1 |
1 |
3 |
1 |
5 |
1 |
5 |
B |
0 |
2 |
0 |
4 |
0 |
6 |
7 |
(b) Below is a partially-completed KMP DFA for a string sof length 6 over the alphabet {a, B}. State 6 is the accept state. Fill in the missing spots in the table.
j |
0 |
1 |
2 |
3 |
4 |
5 |
pat.charAt(j) |
|
|
|
|
|
|
A |
1 |
1 |
|
|
|
|
B |
|
|
3 |
|
|
3 |
(c) Given each of the following strings as input, what state would the DFA in (a) end in?
BABAA
ABABABA
BABABABA
BBAABBABAB
Answers
(a) ABABABB
(b)
j |
0 |
1 |
2 |
3 |
4 |
5 |
pat.charAt(j) |
A |
B |
B |
A |
B |
A |
A |
1 |
1 |
1 |
4 |
1 |
6 |
B |
0 |
2 |
3 |
0 |
5 |
3 |
(c) 1,5,5,4
B level
- Fall 2011 Final, #6 (KMP)
- Spring 2012 Final, #7 (KMP)
- Fall 2012, #9 (KMP)
- Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?
A level
-
For each algorithm (the version discussed in lecture and the textbook),
give the worst-case order of growth in terms of M and N.
------ brute-force substring search for a query string of size M
in a text string of size N
------ Knuth-Morris Pratt substring search for a query string of
size M in a text string of size N
------ Boyer-Moore (with only mismatch heuristic) substring
search for a query string of size M in a text string of size
N
------ simulating a DFA with M vertices and 2M edges on a text
string of size N
Answers
MN, N or M + N, MN, N
- Textbook: 5.3.22
- Textbook: 5.3.26