Substring search problem: Find an instance (or all instances) of a
query substring of length M in a text string of length N.
Sustring must be precisely defined (no regular expressions).
You should be able to manually use a KMP DFA, and you should be able to
manually carry out Boyers-Moore and Rabin Karp.
KMP
How do you construct the DFA?
How much time does it take if you re-simulate every time you have a mismatch?
It's ok if you don't fully understand the linear time construction process.
What is the best-case running time for DFA construction and DFA simulation?
The worst-case running time?
Boyer-Moore
What is the mismatched character heuristic? Why do we use the rightmost character?
Why is the mismatched character heuristic strictly suboptimal?
Why do we use it then -- because the basic idea is very similar to KMP
and you'll learn it if you ever really need to.
What is the best-case running time? The worst-case running time?
Which inputs result in best and worst case performance?
Rabin Karp
If we know mod(ABCDEFG, R), how do we compute mod(BCDEFGH, R) in constant time
(where A through H are arbitrary digits of a number from some alphabet of radix R)?
What are the Las Vegas and Monte Carlo versions of Rabin-Karp?
How would we extend Rabin-Karp to efficiently search
for any one of P possible patterns in a text of length N? How would
this technique compare to using KMP or Boyer-Moore for the same task?
(a) Given the following KMP DFA, give the string that this DFA searches for
j
0
1
2
3
4
5
6
A
1
1
3
1
5
1
5
B
0
2
0
4
0
6
7
(b) Below is a partially-completed KMP DFA for a string sof length 6 over the alphabet {a, B}. State 6 is the accept state. Fill in the missing spots in the table.
j
0
1
2
3
4
5
pat.charAt(j)
A
1
1
B
3
3
(c) Given each of the following strings as input, what state would the DFA in (b) end in?
BABAA ABABABA BABABABA BBAABBABAB
Answers
(a) ABABABB
(b)
j
0
1
2
3
4
5
pat.charAt(j)
A
B
B
A
B
B
A
1
1
1
4
1
1
B
0
2
3
0
2
3
(c) 1,5,5,4
B level
(KMP)
Below is a partially-completed Knuth-Morris-Pratt DFA for a string s of length 11 over the
alphabet { A , B }. Reconstruct the DFA and s in the space below.
Give an example of when you might want to use KMP? Boyer Moore? Rabin Karp?
A level
For each algorithm (the version discussed in lecture and the textbook),
give the worst-case order of growth in terms of M and N.
------ brute-force substring search for a query string of size M
in a text string of size N
------ Knuth-Morris Pratt substring search for a query string of
size M in a text string of size N
------ Boyer-Moore (with only mismatch heuristic) substring
search for a query string of size M in a text string of size
N
------ Monte Carlo version of Rabin-Karp substring search (that
checks only for a hash match) for a query string of size M
in a text string of size N
------ regular-expression pattern matching for a pattern of size
M on a text string of size N
------ simulating a DFA with M vertices and 2M edges on a text
string of size N
------ simulating an NFA with M vertices and 3M edges on a
text string of size N
Answers
MN, N or M + N, MN, N or M + N, MN, N, MN
Give an example of when you might prefer to use the Monte Carlo version of Rabin Karp over the Las Vegas version.