STRING SORTS STUDY GUIDE
Terminology
- String - sequence of characters from a finite ordered alphabet.
In Java, our alphabet is the set of all 16-bit integers (representing Unicode characters).
- Radix - just another word for 'base' as in the base of a number system.
For example, the radix for words written in lowercase English letters is 26. For number written in Arabic numerals it is 10.
- Radix sort - a sort that works with one character at a time (by grouping objects that have the same digit in the same position).
- Note: I will use 'character' and 'digit' interchangably in this study guide.
Key indexed counting.
Allows you to sort N keys that are integers between 0 and R-1 in time proportional to N + R.
Beats linearithmic lower bound by avoiding any binary compares.
This is a completely different philosophy for how things should be sorted. This is the most important concept for this lecture.
Manually performing LSD and MSD. Should be doable in your sleep.
LSD.
- Requires fixed length strings (can pad end of strings with 'smaller-than-everything' character).
- Requires proportional to W N calls to charAt(). Why?
- Requires time proportional to W(N + R). Why?
- Why do we consider these run times to be linear, despite the fact that they involve products WN and WR?
- Requires N + R space. Why?
- What sorting technique is used as a subroutine in LSD? Would a standard technique (e.g. quicksort) work? Does the sort need to be stable?
- For a fixed alphabet and key size, what are the best case and worst case inputs for LSD?
MSD.
- What sorting technique is used as a subroutine in MSD? Would a standard technique (e.g. quicksort) work? Does the sort need to be stable?
- How much memory does MSD use? Why is MSD so memory hungry? What sort of inputs result in the greatest memory usage?
- Why is it so important to switch to insertion sort (or another sort) for small subarrays? Why did we not have to do this in LSD?
- For a fixed alphabet and key size, what are the best and worst case inputs for MSD?
- What is the role of our special charAt(char, int) method?
3-way String Quicksort.
Suffix sorting.
Recommended Problems
C level
- Spring 2012 Final, #6
- Fall 2014 Final, #3
- Spring 2015 Final, #5
B level
- Fall 2008 Final, #6
- Fall 2012 Final, #7
- Spring 2008 Final, #12
- Textbook 5.1.8, 5.1.10
A level
- How could we avoid the performance hit from our special charAt() function?
- What makes MSD cache unfriendly?