CS 226 Lecture 1: Introduction

R. Sedgewick 
rs@cs.princeton.edu

Intermediate-level survey course 
  prerequisite: COS 126
  programming/problem solving

Algorithm: method for solving a problem
Data structure: a way to store information

Efficient algorithms use good data structures

Types of algorithms considered
  Sorting
  Searching
  Strings
  Geometric
  Graph
  Mathematical  
  Other
----------
          Why study algorithms?

Using a computer?
  want it to go faster
  want it to process more data
  want to do something that would
    otherwise be impossible

Technology improves things by a constant factor
Good algorithm design can do better
  Abacus
  Slide Rule
  Calculator
  Computer
  Supercomputer
A bad algorithm on supercomputer
 can be worse than a good algorithm on abacus

Algorithms as a field of study
  old enough that basics are known
  new enough that new discoveries arise
  burgeoning application areas
    science
    engineering
    commercial
  philosophical implications
    .
    .
    .
----------
          Analysis of algorithms

Compare algorithms by comparing estimated costs 
Refine and verify estimates for the best ones

N: size of the input
Running time usually proportional to
  1
  log N
  N
  N log N
  N^2
  2^N
Other functions sometimes arise
  loglog N  [log(log N)]
  log* N    number of logs until 1 reached

Often can write down a formula for running time
  * worst case (guarantee)
  * average case (prediction)

Analysis depends on detailed information
  * costs of basic operations
  * properties of input

Suppress details in approximate analysis
  "Within a constant factor" O-notation
  Predict time for 2N from time for N

----------
          Example: Online connectivity

Input:
  sequence of pairs of integers (p, q)
  p "is connected to" q
Output:
  nothing if p and q are already connected
  (p, q) otherwise

--
  in     out    evidence
  ---    ---
  3 4    3 4  
  4 9    4 9  
  8 0    8 0  
  2 3    2 3  
  5 6    5 6  
  2 9           (2--3--4--9)
  5 9    5 9  
  7 3    7 3  
  4 8    4 8  
  5 6           (5--6)
  0 2           (2--3--4--8--0)
  6 1    6 1  
---

Example of application
  integers represent computers
  pairs represent network connections
  can p and q communicate through network?
----------
          Network connection example

Number of nodes and edges can be huge 
  for practical problems
Ex:
  telephone network
  computer chip

Disconnected piece may be hard to spot
  particularly for a computer!


----------
          Quick-find algorithm

Maintain array with names for components
  if i and j are connected,
  id[i] and id[j] are the same

To maintain this property for p-q connection
  ignore if id[p] = id[q]
  change all entries with p's id to q's id
  
--
main(int argc, char *argv[])
  { int i, p, q, t, N = atoi(argv[1]);
    int *id = malloc(N*sizeof(int));
    for (i = 0; i < N; i++) id[i] = i;
    while (scanf("%d %d\n", &p, &q) != EOF)
      { 
        if (id[p] == id[q]) continue;
        t = id[p];
        for (i = 0; i < N; i++)
          if (id[i] == t) id[i] = id[q];
        printf(" %d %d\n", p, q);
      }
  }
---

QUICK-FIND name due to constant-time test
  to find out if edge makes a new connection
SLOW-UNION?
----------
          Quick-find example

Trace contents of id array 
  after each connection

Ex:
--
  3 4  0 1 2 4 4 5 6 7 8 9 
  4 9  0 1 2 9 9 5 6 7 8 9 
  8 0  0 1 2 9 9 5 6 7 0 9 
  2 3  0 1 9 9 9 5 6 7 0 9 
  5 6  0 1 9 9 9 6 6 7 0 9 
  5 9  0 1 9 9 9 9 9 7 0 9 
  7 3  0 1 9 9 9 9 9 9 0 9 
  4 8  0 1 0 0 0 0 0 0 0 0 
  6 1  1 1 1 1 1 1 1 1 1 1 
---


Problem with quick-find: 
  change too many entries for union
  too slow for huge problems

----------
          Quick-find example

Use graphical representation
  numbers in nodes
  node i points up to id[i]
  top nodes represent components


----------
          Problem size and computation time

Rough standard for 2000 
  10^9 operations per second
  10^9 words of memory
  touch each word in approximately 1 second
    (roughly unchanged since at least 1950)

Bigger memories --> bigger problems

Ex: huge problem for quick-find
  10^10 edges connecting 10^9 nodes
    (edges need not fit in memory)
  examples: telephone network, computer chip

Quick-find might take 10^20 operations
  relabel every node (10 ops) for every edge
  3000 years of computer time (too much)

Use rough estimate
  number of edges and nodes both O(N)
  running time of quick find O(N^2)

Gap grows as scale increases
    new computer may be 10 times faster
    and have 10 times as much memory
    but (with quadratic algorithm) 
      takes 10 times as long to finish!
----------
          Quick-union algorithm

Maintain array with names for components
  if i and j are connected,
  (id[i])* and (id[j])* are the same
  where (id[i])* = id[id[id[...id[i]]]] 
    (go until it doesn't change)

To maintain this property for p-q connection
  ignore if (id[p])* = (id[q])*
  set id[i] to j
  
--
main(int argc, char *argv[])
  { int i, j, p, q, t, N = atoi(argv[1]);
    int *id = malloc(N*sizeof(int));
    for (i = 0; i < N; i++) id[i] = i;
    while (scanf("%d %d\n", &p, &q) != EOF)
      { 
        i = p; j = q;
        while (i != id[i]) i = id[i];
        while (j != id[j]) j = id[j];
        if (i == j) continue;
        id[i] = j;
      }
  }
---

QUICK-UNION: constant-time to make a new connection
SLOW-FIND?
----------
          Quick-union example

Graphical representation exposes tree structure
FIND operation is traveling up tree to root
  (check if two nodes hit the same root)
UNION operation links one tree to another

Ex:
--
  3 4  0 1 2 4 4 5 6 7 8 9 
  4 9  0 1 2 4 9 5 6 7 8 9 
  8 0  0 1 2 4 9 5 6 7 0 9 
  2 3  0 1 9 4 9 5 6 7 0 9 
  5 6  0 1 9 4 9 6 6 7 0 9 
  5 9  0 1 9 4 9 6 9 7 0 9 
  7 3  0 1 9 4 9 6 9 9 0 9 
  4 8  0 1 9 4 9 6 9 9 0 0 
  6 1  1 1 9 4 9 6 9 9 0 0 
---

Faster than quick-find for "random" input
  (is real input "random"?)

Problem with quick-union: 
  paths to root can grow too long
  no guarantee for huge problems

----------
          Quick-union example


----------
          Weighted quick-union algorithm

Modify quick-union to avoid imbalance
  keep track of size of each component
  balance by linking small one below large one
  
--
main(int argc, char *argv[])
  { int i, j, p, q, t, N = atoi(argv[1]);
    int *id = malloc(N*sizeof(int));
    int *sz = malloc(N*sizeof(int));
    for (i = 0; i < N; i++) id[i] = i;
    for (i = 0; i < N; i++) sz[i] = 1;
    while (scanf("%d %d\n", &p, &q) != EOF)
      { 
        for (i = p; i != id[i]; i = id[i]) ;
        for (j = q; j != id[j]; j = id[j]) ;
        if (i == j) continue;
        if (sz[i] < sz[j])
             { id[i] = j; sz[j] += sz[i]; }
        else { id[j] = i; sz[i] += sz[j]; }
        printf(" %d %d\n", p, q);
      }
  }
---

Fast enough?

----------
          Weighted quick-union example

Avoids long paths in trees

Ex:
--
  3 4  0 1 2 3 3 5 6 7 8 9 
  4 9  0 1 2 3 3 5 6 7 8 3 
  8 0  8 1 2 3 3 5 6 7 8 3 
  2 3  8 1 3 3 3 5 6 7 8 3 
  5 6  8 1 3 3 3 5 5 7 8 3 
  5 9  8 1 3 3 3 3 5 7 8 3 
  7 3  8 1 3 3 3 3 5 3 8 3 
  4 8  8 1 3 3 3 3 5 3 3 3 
  6 1  8 3 3 3 3 3 5 3 3 3 
---

Is performance improved?
To answer this question
  run empirical studies
  analyze the algorithm
  [need to do both]  

----------
          Weighted quick-union example

----------
          Weighted quick-union worst case

Height of biggest tree increases only
  when it links to an equally big tree

Ex:
--
  0 1  0 0 2 3 4 5 6 7 8 9 
  2 3  0 0 2 2 4 5 6 7 8 9 
  4 5  0 0 2 2 4 4 6 7 8 9 
  6 7  0 0 2 2 4 4 6 6 8 9 
  8 9  0 0 2 2 4 4 6 6 8 8 
  0 2  0 0 0 2 4 4 6 6 8 8 
  4 6  0 0 0 2 4 4 4 6 8 8 
  0 4  0 0 0 2 0 4 4 6 8 8 
  6 8  0 0 0 2 0 4 4 6 0 8 
---

Good news:
Worst case is O(lg N) steps per edge
Better news:
Average case is O(1) steps per edge

Ex: 
  10^10 edges connecting 10^9 nodes
  reduces time from 3000 years to 1 minute
----------
          Weighted quick-union worst case


----------
          Path compression

Modify weighted quick-union to compress tree
  make every node touched point to the new root
  (no reason not to!)
  
--
main(int argc, char *argv[])
{ int i, j, p, q, t, N = atoi(argv[1]);
  int *id = malloc(N*sizeof(int));
  int *sz = malloc(N*sizeof(int));
  for (i = 0; i < N; i++) id[i] = i;
  for (i = 0; i < N; i++) sz[i] = 1;
  while (scanf("%d %d\n", &p, &q) != EOF)
  { 
    for (i = p; i != id[i]; i = id[i]) ;
    for (j = q; j != id[j]; j = id[j]) ;
    if (i == j) continue;
    if (sz[i] < sz[j])
      { id[i] = j; sz[j] += sz[i]; t = j; }
    else
      { id[j] = i; sz[i] += sz[j]; t = i; }
    for (i = p; i != id[i]; i = id[i]) id[i] = t;
    for (j = q; j != id[j]; j = id[j]) id[j] = t;
    printf(" %d %d\n", p, q);
  }
}
---

Worth doing?

----------
          Weighted quick-union with path compression

Avoids long paths in trees

--
  3 4  0 1 2 3 3 5 6 7 8 9 
  4 9  0 1 2 3 3 5 6 7 8 3 
  8 0  8 1 2 3 3 5 6 7 8 3 
  2 3  8 1 3 3 3 5 6 7 8 3 
  5 6  8 1 3 3 3 5 5 7 8 3 
  5 9  8 1 3 3 3 3 5 7 8 3 
  7 3  8 1 3 3 3 3 5 3 8 3 
  4 8  8 1 3 3 3 3 5 3 3 3 
  6 1  8 3 3 3 3 3 3 3 3 3 
---

THM: Worst-case tree height is O(lg* N)
Proof: Extremely difficult 
  (but the *algorithm* is still simple!)
Note: lg* N is constant in this world
--
.                    N   lg* N
.               ---------------
.                    2     1
.                    4     2
.                   16     3
.                65536     4
.  any practical value     5
---

Reduces cost to within a constant factor
  of gathering the data

----------
          Path compression example


----------
          Union-find summary

Worst-case cost per edge is proportional to
--
  quick-find         N
  quick-union        N
  weighted          lg N
  path compression   5
---

Online algorithm: can solve problem while
  collecting the data, for "free"

Set-merging abstraction 
  FIND: is A in the same set as B?
  UNION: merge A's set and B's set
  useful for many other applications

Lessons

* start with simple algorithm
* don't use simple algorithm for large problems
* can't use simple algorithm for huge problems
* higher level of abstraction (tree) is helpful
* fast performance on real data OK, but
* strive for worst-case performance guarantees
* identify fundamental abstraction

A "trivial" algorithm can be 
  both useful and interesting

----------
          SORTING


  Elementary algorithms, Shellsort

  Quicksort

  Mergesort

  Priority queues

  Radix sorts

 * Sort an array that fills memory
 * Make the union of M spelling dictionaries
 * Priority queue abstract data type

----------
          SEARCHING


  Tree searching

  Hashing

  Trie searching

 * Oxford English Dictionary
 * Internet search engines
 * DNA subsequence library
 * Dictionary abstract data type
   for other algorithms
----------
          STRINGS


  String searching

  Pattern matching

  File compression

 * file systems, audio and video
 * cryptology

----------
          GEOMETRIC ALGORITHMS


  Elementary algorithms

  Convex hull

  Multidimensional searching

 * N-body simulation
 * "Toy Story"

----------
          GRAPH ALGORITHMS


  Properties of graphs

  Searching in graphs

  Advanced graph algorithms

 * Cycle detection
 * matching (e. g. medical residents to jobs)
 * networks

----------
          OTHER TOPICS


  Mathematical algorithms

  Dynamic programming

  Parallel algorithms

  Randomized algorithms

  Intractable problems

----------
          COURSEWORK

Text
  * Algorithms, 3rd edition, in C
  * new book, old book after midterm

Lecture notes

Programming Assignments
  * weekly, eleven in all
  * electronic submission 
    Program due Thursdays 11:59PM
      first one due Thursday NEXT week 
  * README due following Sunday at 11:59PM
Problem Sets
  * weekly, nine in all
  * due in precept 
  * first one due NEXT Monday or Tuesday 
Exams
  * open book, notes
  * midterm in class Wednesday before break
  * final at scheduled time  

Online course information on homepage

READ (and do not lose) HANDOUT ONE
READ ONLINE INFORMATION
          COS 226 Lecture 2: Elementary sorting algs

Insertion sort


Selection sort


Bubble sort


Shellsort


-----
          Why study elementary algorithms?


Easy to code

Fastest for small files

Context for developing ground rules

Fastest in some special situations

May not be so elementary

-----
          Ground rules


FILES of RECORDS containing KEYS

File fits in memory

Use abstract comparison, exchange

--
typedef int Item
#define less(A, B)  (A < B)
#define exch(A, B)  
      { Item t = A; A = B; B = t; } 
---

Macros or subroutines?
  Macros: low cost, simple
  Subroutines: more general

-----
          Selection sort


--
void selection(Item a[], int l, int r)
  { int i, j;
    for (i = l; i < r; i++)
      { int min = i;
        for (j = i+1; j <= r; j++) 
            if (less(a[j], a[min])) min = j;
        exch(a[i], a[min]);
      } 
  }
---

-----
          Insertion sort


--
void insertion(Item a[], int l, int r)
  { int i, j; 
    for (i = l+1; i <= r; i++)
      { Item v = a[i];
        j = i; 
        while (j > l && less(v, a[j-1])) 
          { a[j] = a[j-1]; j--; }
        a[j] = v; 
      } 
  }
---

-----
          Bubble sort


--
void bubble(Item a[], int l, int r)
  { int i, j;
    for (i = l; i < r; i++)
      for (j = r; j > i; j--) 
          compexch(a[j], a[j-1]);
  }
---

Improvements:
  * add a test to exit if no exchanges
  * go back and forth

-----
          Properties of elementary sorts

All quadratic running time

Selection sort
  comparisons: N-1 + N-2 + ... + 2 + 1 = N^2/2 
  exchanges: N  

Insertion sort
  comparisons: (N-1 + N-2 + ... + 1)/2 = N^2/4 
  exchanges: N^2/4

Bubble sort
  comparisons: N-1 + N-2 + ... + 2 + 1 = N^2/2 
  exchanges: about N^2/2

Large records, small keys
  selection sort linear in amount of data
  N records M words (1-word keys)
    comparison cost N^2/2
    exchange cost NM
  if N is about equal to M
    costs and amount of data are both about M^2
    LINEAR sort

Files nearly in order
  bubble and insertion sort can be linear
  (quicksort can be quadratic)
-----
          Pointer sort

Sort large records by swapping references to 
  the records, not the records themselves

--
  1   9    Fox      1    --- [associated info] ---
  2   6    Quilici  1    ---      ...          ---
  3   8    Chen     2    ---      ...          ---
  4   3    Furia    3    ---      ...          ---
  5   1    Kanaga   3    ---      ...          ---
  6   4    Andrews  3    ---      ...          ---
  7  10    Rohde    3    ---      ...          ---
  8   5    Battle   4    ---      ...          ---
  9   2    Aaron    4    ---      ...          ---
  10  7    Gazsi    4    ---      ...          ---
---

Trivial to implement: change abstract comparison

Array indices
--
typedef int Item
#define less(A, B) (data[A].key < data[B].key)
#define exch(A, B) 
  { Item t = A; A = B; B = t; } 
---
True pointers
--
typedef dataType* Item
#define less(A, B) (*A.key < *B.key)
#define exch(A, B) 
  { Item t = A; A = B; B = t; } 
---

-----
          Stable sorting for two-key records

Sort on the first key, then on the second
--
  Aaron    4      Fox      1
  Andrews  3      Quilici  1
  Battle   4      Chen     2
  Chen     2      Furia    3
  Fox      1      Kanaga   3
  Furia    3      Andrews  3
  Gazsi    4      Rohde    3
  Kanaga   3      Battle   4
  Quilici  1      Aaron    4
  Rohde    3      Gazsi    4
---

Stable sort: stays sorted on first key where
  equal on second
--
                Fox      1
                Quilici  1
                Chen     2
                Andrews  3
                Furia    3
                Kanaga   3
                Rohde    3
                Aaron    4
                Battle   4
                Gazsi    4
---

Which of the elementary methods are stable?
-----
          4-sorting

Divide into 4 subfiles
  every 4th element starting at the 1st
  every 4th element starting at the 2nd
  every 4th element starting at the 3rd
  every 4th element starting at the 4th


-----
          Interleaved 4-sorting

Use insertion sort with an "increment" of 4


--
h = 4;
for (i = l+h; i <= r; i++)
  { Item v = a[i]; 
    j = i;
    while (j >= l+h && less(v, a[j-h]))
      { a[j] = a[j-h]; j -= h; }
    a[j] = v; 
  } 
---

-----
          Shellsort

Use a decreasing sequence of increments

Each pass makes the next easier

/lineno 6 def
 13
/lineno 9 def
 4
/lineno 18 def
 1
-----
          Shellsort implementation

--
void shellsort(Item a[], int l, int r)
  { int i, j; 
    int incs[16] = { 1391376, 463792, 198768, 
                     86961, 33936, 13776, 4592, 
                     1968, 861, 336, 112, 48, 
                     21, 7, 3, 1 };
    for ( k = 0; k < 16; k++)
      { int h = incs[k];
        for (i = l+h; i <= r; i++)
          { Item v = a[i]; 
            j = i;
            while (j >= h && less(v, a[j-h]))
              { a[j] = a[j-h]; j -= h; }
            a[j] = v; 
          } 
      }
  }
---

Need a sort routine, fast?  Use Shellsort!
  * not much code
  * best method for small and medium files
  * still OK even for giant files

How do we know what increments to use?
  * plenty of proven winners to use
  * easiest: 1, 4, 13, 40, 121, 364, 1093, ...  
-----
          Relatively prime increment sequences

When we h-sort a file that is k-sorted, 
  it stays k-sorted
(Know an easy proof? SEND MAIL)

Only 18N comparisons are needed to 1-sort a file 
  that is 4-sorted and 13-sorted

Elements to the left of x that could be greater:

                                  x
In general, if h and k are relatively prime:

Only (h-1)(k-1)N comparisons to 1-sort a file 
  that is h-sorted and k-sorted
Only (h-1)(k-1)N/g comparisons to g-sort a file 
  that is h-sorted and k-sorted

Big increments, small files:    h(N/h)^2 = N^2/h
Small increments, use theorem:  h^2N/h   = Nh      

Tradeoff best bounds: N^(3/2) total
Similar methods (harder proofs) give 4/3, 5/4, 6/5 ...
-----
          More increment sequences

On the other hand, common divisors are good:

Only N comparisons to 1-sort a file that is
    2-sorted and 3-sorted
Only N comparisons to 2-sort a file that is
    4-sorted and 6-sorted
Only N comparisons to 3-sort a file that is
    6-sorted and 9-sorted

.                  1
.                2   3
.              4   6   9
.            8   12  18  27
.          16  24  36  54  81
.        32  48  72  108 162 243
.      64  96  144 216 324 486 729

Total time: N (log N)(log N)

Too many increments for real sizes
  * start with bigger numbers than 2 and 3
  * mix up with primes

Have a better idea for an increment sequence?  
  SEND MAIL if it beats 1 2 7 21 48 112 336 ...
          COS 226 Lecture 3: Quicksort

To sort an array, first divide it so that
  * some element a[i] is in its final position
  * no larger element left of i
  * no smaller element right of i
Then sort the left and right parts recursively


----------
          Partitioning

To partition an array, pick a partitioning element
  * scan from right for smaller element
  * scan from left for larger element
  * exchange
  * repeat until pointers cross


----------
          Partitioning implementation

v: partitioning element
i: left-to-right pointer
j: right-to-left pointer

--
int partition(Item a[], int l, int r)
  { 
    int i, j; Item v;
    v = a[r]; i = l-1; j = r;
    for (;;)
      { 
        while (less(a[++i], v)) ;
        while (less(v, a[--j])) 
          if (j == l) break;
        if (i >= j) break;
        exch(a[i], a[j]);
      }
    exch(a[i], a[r]);
    return i;
  }
---

Issues
* stop pointers on keys equal to v?
* sentinels or explicit tests for array bounds?
* details of pointer crossing
----------
          Quicksort implementation

-- 
quicksort(Item a[], int l, int r)
  {
    int i;
    if (r > l)
      {
        i = partition(a, l, r);
        quicksort(a, l, i-1);
        quicksort(a, i+1, r);
      }
  }
---

Issues
  * overhead for recursion?
  * running time depends on input
  * worst-case time cost 
      (quadratic, a problem)
  * worst-case space cost 
      (linear, a serious problem)


----------
          Nonrecursive Quicksort

Use explicit stack instead of recursive calls
Sort smaller of two subfiles first

--
#define push2(A, B)  push(A); push(B);
void quicksort(Item a[], int l, int r)
  { 
    int i;
    stackinit(); push2(l, r);
    while (!stackempty())
      {
        r = pop(); l = pop(); 
        if (r <= l) continue;
        i = partition(a, l, r);
        if (i-l > r-i)
          { push2(l, i-1); push2(i+1, r); }
        else
          { push2(i+1, r); push2(l, i-1); }
      }
---

Guaranteed less than (lg N) extra space cost
Still quadratic worst-case time cost

----------
          Analysis

Total running time is sum of
  cost*frequency
for all the basic operations

Cost depends on machine
Frequency depends on algorithm, input

For Quicksort
  A -- number of partitioning stages
  B -- number of exchanges
  C -- number of comparisons

Cost on a typical machine
  35A + 11B + 4C

Number of comparisons in the worst case
  N + (N-1) + (N-2) + ... = N(N-1)/2

----------
          Average case analysis

Assume input randomly ordered
  * each element equally likely 
      to be partitioning element
  * subfiles randomly ordered
      (if partitioning is "blind")
Average number of comparisions satisfies
 C(N) = N+1 
  +  (C(1) + C(N-1))/N 
  +  (C(2) + C(N-2))/N
  ...
  +  (C(N-1) + C(1))/N
 C(N) = N+1 + 2( C(1)  +  C(2) + ... + C(N-1) )/N
 NC(N) = N(N+1)  + 2( C(1) + C(2) + ... + C(N-1))
 NC(N) - (N-1)C(N-1) = 2N + 2C(N-1)
 NC(N) = (N+1)C(N-1) + 2N
 C(N)/(N+1) = C(N-1)/N + 2/(N+1)
  = 2( 1 + 1/2 + 1/3 + ... 1/(N+1) )
  = 2 ln N + (small error term)
THM: Quicksort uses about 2N ln N = (1.38...)N lg N
  comparisons (and 1/6 as many exchanges)
  on average.

Randomized algorithm
  * pick random partitioning element
  * analysis then valid for any input
----------
          Inner loop

Use profiler 
  * find instructions executed most often
  * verify analysis
  * streamline program by iterating

-- 
quicksort(int a[], int l, int r)
  <132659>{ 
  int v, i, j, t;
  if (<132659>r > l)
    { 
      <66329>v = a[r]; 
      <66329>i = l-1; <66329>j = r;
      for (<66329>;<327102>;<327102>)
        { 
          while (<1033228>a[++i] < v) <639797>;
          while (<1077847>a[--j] > v) <684416>;
          if (<393431>i >= j) <66329>break;
       <327102>t = a[i]; a[i] = a[j]; a[j] = t;
        }
      <66329>t = a[i]; a[i] = a[r]; a[r] = t;
      <66329>quicksort(a, l, i-1);
      <66329>quicksort(a, i+1, r);
    } 
  <132659>}
---
----------
          Another partitioning method

Seeking an easier-to-understand method
  (detailed justification omitted)

-- 
quicksort(int a[], int l, int r)
<133395>{ 
int v, i, k, t;
if (<133395>r <= l) return;

<66697>v = a[l]; <66697>k = l;
for (<66697>i=l+1; <1976624>i<=r; <1909927>i++)
  if (<1909927>a[i] < v)
  { <934565>t = a[i]; a[i] = a[++k]; a[k] = t; }
  <66697>t = a[k]; a[k] = a[l]; a[l] = t;
  <66697>quicksort(a, l, k-1);
  <66697>quicksort(a, k+1, r);
  } 

<133395>}
---

Not much simpler, 
  three times as many exchanges

----------
          Improvements to Quicksort


Median-of-three
  * partitioning element closer to center
  * estimate median with median of sample
  * number of comparisons close to N lg N
      with large enough sample
  * FEWER LARGE FILES
  * slightly more exchanges, more overhead


Insertion sort small subfiles
  * even Quicksort has too much overhead
    for files of a few elements
  * use insertion sort for tiny files
  * (can wait until the end)


Analysis gives best value of parameters
  * median of 3 elements
  * cut to insertion sort for < 10 elements

----------
          Improvements to Quicksort


Standard
  * might have bad partition (1st one in ex.)
  * might have good partition (2nd one in ex.)


Cutoff for small subfiles


Median-of-three


----------
          Three-way partitioning

Equal keys can affect performance

Suppose all keys are the same
 * plain quicksort takes N lg N comparisons (!)
 * change partitioning to take N comparisons
 * naive method might use N^2 comparisons (!!)

Suppose there are two distinct key values
 * reduces to above case for one subfile
 * better to complete sort with one partition
     stop right pointer on left value
     stop left pointer of right value
     exchange       
   
Large file, small number of key values
 * reduces to above cases

Partition into three parts
 * elements between i and j equal to v
 * no larger element left of i
 * no smaller element right of j

----------
          Three-way partitioning implementation

--
void quicksort(Item a[], int l, int r)
{ 
  int i, j, k, p, q; Item v;
  if (r <= l) return;

  v = a[r]; i = l-1; j = r; p = l-1; q = r;
  for (;;)
  { 
    while (less(a[++i], v)) ;
    while (less(v, a[--j])) if (j == l) break;
    if (i >= j) break;
    exch(a[i], a[j]);
    if (eq(a[i],v)) { p++; exch(a[p],a[i]); }
    if (eq(v,a[j])) { q--; exch(a[q],a[j]); }
  }
  exch(a[i], a[r]); j = i-1; i = i+1;
  for (k = l  ; k < p; k++, j--) exch(a[k], a[j]);
  for (k = r-1; k > q; k--, i++) exch(a[k], a[i]);
  quicksort(a, l, j);
  quicksort(a, i, r); 
}
---
----------
          Sorting strings

Use three-way partitioning on key characters

--
    actinian       coenobite     actinian
    jeffrey        conelrad      bracteal
    coenobite      actinian      coenobite
    conelrad       bracteal      conelrad
    secureness     secureness    cumin
    cumin          dilatedly     chariness
    chariness      inkblot       centesimal
    bracteal       jeffrey       cankerous
    displease      displease     circumflex
    millwright     millwright    millwright
    repertoire     repertoire    repertoire
    dourness       dourness      dourness
    centesimal     southeast     southeast
    fondler        fondler       fondler
    interval       interval      interval
    reversionary   reversionary  reversionary
    dilatedly      cumin         secureness
    inkblot        chariness     dilatedly
    southeast      centesimal    inkblot
    cankerous      cankerous     jeffrey
    circumflex     circumflex    displease
---

Not many disadvantanges 
(to be continued: this is a radix sort)          COS 226 Lecture 4: Mergesort


Prototypical divide-and-conquer algorithm


Guaranteed to run in O(N log N) steps
Drawback: 
  Linear extra space for the merge
  (can only sort half the memory)

An "optimal" sorting method

Leads us to consider
    recurrence relationships
    computational complexity
    deep hacking
    fractals
  
----------
          Merging two sorted files

-- 
#define T Item
merge(T c[], T a[], int N, T b[], int M )
{ int i, j, k;
  for (i = 0, j = 0, k = 0; k < N+M; k++)
    {
      if (i == N) { c[k] = b[j++]; continue; }
      if (j == M) { c[k] = a[i++]; continue; }
      if (less(a[i], b[j]))
        c[k] = a[i++]; else c[k] = b[j++];
    }
}
---


Seems trivial, but try doing it
  without using linear extra space

----------
          Abstract inplace merge

--
Item aux[maxN];
merge(Item a[], int l, int m, int r)
{ int i, j, k;
  for (i = m+1; i > l; i--) aux[i-1] = a[i-1];
  for (j = m; j < r; j++) aux[r+m-j] = a[j+1];
  for (k = l; k <= r; k++)
     if (less(aux[i], aux[j])) 
        a[k] = aux[i++]; else a[k] = aux[j--];
}
---


Tricky way to avoid special tests for
  ends of arrays
Still uses extra space, but easier for calling
  routine to assume inplace

----------
          Mergesort

Sort a file by (recursively) sorting
  two halves, then merge the results


----------
          Mergesort implementation

Direct recursive code
Uses abstract inplace merge

-- 
void mergesort(Item a[], int l, int r)
  {
    int m = (r+l)/2;
    if (r <= l) return;
    mergesort(a, l, m);  
    mergesort(a, m+1, r);
    merge(a, l, m, r);
  }
---

Tree structures describe merge file sizes


----------
          Recurrences

Direct relationship to recursive programs
  (most programs are "recursive")

Easy telescoping recurrences
  T(N) = T(N-1) + 1      T(N) = N
  T(2^n) = T(2^(n-1)) + 1  T(N) = lg N if N=2^n

Short list of important recurrences
  T(N) = T(N/2) + 1    T(N) = lg N
  T(N) = T(N/2) + N    T(N) = N
  T(N) = 2T(N/2) + 1   T(N) = N
  T(N) = 2T(N/2) + N   T(N) = N lg N
Details in Chapter 2

THM: Mergesort uses N lg N comparisons
  (guaranteed worst-case bound)
 
Divide-and-conquer "master theorem" gives
similar results for other algorithms
  T(N) = aT(N/b) + N^c(lg N)^d

Interested in learning more?
  Stay tuned for a few more in CS 226
  Take CS 341, CS 423
  Read "Introduction to the Analysis of Algs"
    by Sedgewick and Flajolet
----------
          Computational Complexity

Framework to study efficiency of algorithms

Machine model: 
  count fundamental operations

Average case:
  predict performance (need input model)
Worst case:
  guarantee performance (any input)

Upper bound:
  algorithm to solve the problem
Lower bound:
  proof that no algorithm can do better

Caveats:
  worst-case may be unrealistic
  average-case bounds difficult to derive
  costs ignored in analysis may dominate
  machine model may be restrictive

Complexity studies provide 
  starting point for practical implementations
  indication of approaches to be avoided
 
----------
          Complexity of sorting

N lg N comparisons necessary and sufficient

Upper bound: N lg N  (Mergesort)
Lower bound: N lg N - N/(ln 2) + lg N

THM: All comparison-based sorting methods
  must use at least N lg N comparisons
Proof:
  DECISION TREE (all sequences of comparisons)


Claim 1: at least N! leaves
Claim 2: height at least lg N!
Claim 3: (Stirling's formula for lg N!)
  height at least N lg N - N/(ln 2) + lg N

Caveat: what if we don't use comparisons??
  (stay tuned for radix sort)

----------
          Deep hacking on Mergesort inner loop

Avoid move with recursive argument switch on
  call to straight merge.

--
void mergesort(T a[], T b[], int l, int r)
  { int m = (l+r)/2;
    if (r-l <= 10) 
      { insertion(a, l, r); return; }
    mergesort(b, a, l, m);  
    mergesort(b, a, m+1, r);
    merge(a+l, b+l, m-l+1, b+m+1, r-m);
  }
void sort(Item a[], int l, int r)
  { int i;
    for (i = l; i <= r; i++) aux[i] = a[i];
    mergesort(a, aux, l, r);
  }
---

Avoid sentinels with "up-down" trick

Combine the two? Doable, but mindbending

Inner loop: compare, store, two incs
(Quicksort: compare, inc)
----------
          Bottom-up mergesort

Pass through the file
  merge adjacent subfiles
  size doubles each time through


----------
          Bottom-up mergesort implementation


--
void mergesort(Item a[], int l, int r)
  { int i, m;
    for (m = 1; m < r-l; m = m+m)
      for (i = l; i <= r-m; i += m+m)
        merge(a, i, i+m-1, min(i+m+m-1, r));
  }
---


Different set of merges than for top-down
  unless N is a power of two

----------
          Mergesort and numbers

Relationship to binary representation of numbers

THM: 
The number of comparisons used by Mergesort 
for a file of size N 
  is the same as
the number of bits in the binary representations
of all the numbers less than N (plus N-1).

--
  0 0 0 0
  0 0 0 1
  0 0 1 0
  0 0 1 1
  0 1 0 0
  0 1 0 1
  0 1 1 0
  0 1 1 1
  1 0 0 0
  1 0 0 1
  1 0 1 0
  1 0 1 1
---

Proof: They satisfy the same recurrence
  C(2N) = C(N) + C(N) + 2N
  C(2N+1) = C(N) + C(N+1) + 2N+1

----------
          Mergesort and fractals

Divide-and-conquer algorithms 
  can exhibit erratic periodic behavior

number of bits in numbers less than N
  = number of 0 bits
    + number of 1 bits

  = (N lg N)/2 + periodic term
    + (N lg N)/2 + periodic term

  = N lg N + periodic term


Details in Sedgewick-Flajolet

----------
          Merging linked lists

Problem: sort data on a linked list
  (rearrange list so items are in order)
--
typedef struct node *link;
struct node { Item item; link next; };
---

First step: merge implementation
--
link merge(link a, link b)
  { struct node head; link c = &head;
    while ((a != NULL) && (b != NULL))
      if (less(a->item, b->item))
        { c->next = a; c = a; a = a->next; }
      else
        { c->next = b; c = b; b = b->next; }
    c->next = (a == NULL) ? b : a;
    return head.next;
  }
---

Next step: sort implementation
  top-down and bottom-up methods both apply
----------
          List mergesorts

Top-down: split, sort, and merge
--
link mergesort(link c)
  { link a, b;
    if (c->next == NULL) return c;
    a = c; b = c->next;
    while ((b != NULL) && (b->next != NULL))
      { c = c->next; b = b->next->next; }
    b = c->next; c->next = NULL;
    return merge(mergesort(a), mergesort(b));
  }
---

Bottom-up: cycle through a circular list
--
link mergesort(link t)
  { link u;
    for (initQ(); t != NULL; t = u) 
      { u = t->next; t->next = NULL; putQ(t); }
    t = getQ();     
    while (!emptyQ())
      { putQ(t); t = merge(getQ(), getQ()); }
    return t;
  }
---

Mergesort is the method of choice for lists          COS 226 Lecture 5: Priority Queues

Abstract data types
  client 
  interface
  implementation

Priority queue ADT
  insert
  remove the largest

Brute-force implementations

HEAPS 

Heapsort


BINOMIAL QUEUES

-----
          Abstract data types

Separate INTERFACE and IMPLEMENTATION
  easier maintainence of large programs
  build layers of abstraction
  reuse software
  elementary example: pushdown stack

INTERFACE: 
  description of data type, basic operations

CLIENT: 
  program using operations defined in interface

IMPLEMENTATION:
  actual code implementing operations

Client can't know details of implementation
  (many implementations to choose from)
Implementation can't know details of client needs
  (many clients use the same implementation)

Modern programming languages support ADTs
  C++, Modula-3, Oberon, Java (C)

Caveats
  perspective needed 
  performance matters
-----
          Priority queue ADT

Records with keys (priorities)

Two basic operations
  * insert
  * delete the largest
generic operations common to many ADTs
  * create
  * test if empty
  * destroy (often ignored if not harmful)  

INTERFACE for basic operations
--
.      void PQinit();
.      void PQinsert(Item);
.      Item PQdelmax();
.       int PQempty();
---

Should also specify 
  constraints and error conditions

Example applications
  * simulation
  * numerical computation
  * compression algorithms
  * graph searching algorithms

-----
          Sample client program

Find the M SMALLEST of N items

Typical values: M=100, N=1000000

--
    PQinit();
    for (k = 0; k < M; k++) 
        PQinsert(nextItem());
    for (k = M; k < N; k++) 
      {
        PQinsert(nextItem());
        t = PQdelmax();
      }
    for (k = 0; k < M; k++) 
        a[k] = PQdelmax(());
---

Cost? Depends on PQ implementation
Reasonable assumption for any implementation:
  space proportional M

Time bounds for standard implementations:
  brute-force: N M
  best: N lg M
-----
          Typical brute-force implementation

Use an unordered array

--
static Item *pq;
static int N;

PQinsert(Item v)
  { pq[N++] = v; }
Item PQdelmax()
  { 
    int j, max = 0;
    for (j = 1; j < N; j++)
      if (less(pq[max], pq[j])) max = j;
    exch(pq[max], pq[N]);  
    return pq[--N];
  }
void PQinit(int maxN)
  { pq = malloc(maxN*sizeof(Item)); N = 0; }
int PQempty()
  { return N == 0; }
---

Should also test and report errors

Other options for brute-force implementations
  * ordered array
  * unordered singly-linked list
  * ordered singly-linked list
-----
          Client/Interface/Implementation

Separate compilation 
  (Freudian slip: "complication")

INTERFACE
  * define data types
  * declare functions
  * in C, use ".h" file (no executable code)
CLIENT:
  * include ".h" file
  * call functions
IMPLEMENTATION:
  * include ".h" file
  * give code for functions

Client and implementation can be compiled
  at different times, then function calls
  LINKED to their implementations
Details: Sedgewick, Chapter 4; COS 217

Modular programming
  separate compilation abstraction
  essential in implementing big systems
  useful in putting good algs to work
-----
          Priority queue ADT (continued)

Other useful operations
  * construct a PQ from N items
  * return the value of the largest
  * delete a specified item
  * change an item's priority
  * merge together two PQs
Interface more complicated
  need HANDLES for records
  need HANDLES for priority queues
  where's the data? 
    (client, implementation, or both?)
First-class ADT solution 
--
.  typedef struct pq* PQ;
.  typedef struct PQnode* PQlink;
.       PQ PQinit();
.      int PQempty(PQ);
.   PQlink PQinsert(PQ, Item);
.     Item PQdelmax(PQ);
.     void PQchange(PQ, PQlink, Item);
.     void PQdelete(PQ, PQlink);
.       PQ PQjoin(PQ, PQ);
---

PQ and PQlink are pointers to structures
  to be specified in the implementation

More info: section 4.8 in Sedgewick; lecture 7
-----
          PQ implementations cost summary

Worst-case per-operation time cost
  as a function of PQ size

--
.                delete       find change
/lineno lineno .4  sub def
.          insert  max delete  max   key   join
  ordered 
    array    N     1     N     1     N     N
    list     N     1     1     1     N     N
  unordered 
    array    1     N     1     N     1     N
    list     1     N     1     N     1     1

  heap       lg N  lg N  lg N  1   lg N   N
  binomial 
    queue   lg N  lg N  lg N  lg N  lg N  lg N

  best in 
    theory   1    lg N  lg N   1     1     1
---

Algorithm design success story:
  * nearly optimal worst-case cost
  * simple (but ingenious!) algorithms

Costs lower in practice
  average-case analysis
  amortized analysis

-----
          Heap-ordered complete binary trees

COMPLETE BINARY TREE: 
  leaves on two levels, on left at bottom level

HEAP-ORDERED:
  parent larger than both children
  therefore, largest at root
  can define for any tree, not just complete


Array representation:
  root in a[1]
  children of root in a[2] and a[3]
  ...
  children of i in a[2i] and a[2i+1]
  parent of i in a[i/2]  
No explicit links needed for tree

--
.   0  1  2  3  4  5  6  7  8  9 10 11 12
.      X  T  O  G  S  M  N  A  E  R  A  I
---
-----
          Promotion (bubbling up in a heap)

Change key in node at the bottom of the heap

To restore heap condition:
  exchange with parent if necessary


"How to Succeed" model (Peter Principle)

--
fixUp(Item a[], int k)
  {
    while (k > 1 && less(a[k/2], a[k]))
      { exch(a[k], a[k/2]); k = k/2; }
  }
---

easy to make code twice as fast ala insertion
-----
          Demotion (sifting down in a heap)

Change key in node at the top of the heap
To restore heap condition:
  exchange with larger child if necessary


"Boss's child" model (Royal succession?)

--
fixDown(Item a[], int k, int N)
  { int j;
    while (2*k <= N)
      { j = 2*k;
        if (j < N && less(a[j], a[j+1])) j++;
        if (!less(a[k], a[j])) break;
        exch(a[k], a[j]); k = j;
      }
  }
---
-----
          PQ implementation with heaps

PQinsert: add node at bottom
PQdelmax: exch root with node at bottom

--
static Item pq[maxPQsize+1]; 
static int N;

void PQinit(int maxN)
  { pq = malloc(maxN*sizeof(Item)); N = 0; }
int PQempty()
  { return N == 0; }
void PQinsert(Item v)
  { pq[++N] = v; fixUp(pq, N); }
Item PQdelmax()
  { 
    exch(pq[1], pq[N]); 
    fixDown(pq, 1, N-1); 
    return pq[N--]; 
  }
---

No more than lg N steps (GUARANTEED)

-----
          Constructing a heap (top-down)


-----
          Sorting down a heap


-----
          Heapsort

Abandon ADT concept to save space
Faster to construct heap from the top down

--
#define pq(A) a[l-1+A]
void heapsort(Item a[], int l, int r)
  { int k, N = r-l+1;
    for (k = N/2; k >= 1; k--) 
      fixDown(&pq(0), k, N);
    while (N > 1) 
      { 
        exch(pq(1), pq(N)); 
        fixDown(&pq(0), 1, --N); 
      }
  }
---

Widely used sorting method
  * inplace (no extra space)
  * guaranteed NlgN time


-----
          Binomial queues

Support ALL PQ operations in lgN steps
  (Heaps have slow merge)

Def: In a LEFT HEAP-ORDERED tree, each node
  is larger than all nodes in left subtree

Def: A POWER-OF-2 TREE is a binary tree with
  * left subtree of root complete
  * right subtree empty
  (therefore, 2^n nodes)

Def: A BINOMIAL QUEUE of size N is a forest
  of left heap-ordered power-of-2 trees
  one for each 1 bit in binary rep. of N


/lineno lineno 6.5 sub def
Corresponds to heap-ordered forest:

-----
          Joining power-of-2 heaps

Constant-time operation
  * larger of two roots at top
  * left subtree to right subtree of other root
  * result is left-heap-ordered if inputs are


-----
          Joining power-of-2 heaps (code)

Representation
  * two pointers per node
  * need HANDLE (pointer to node)
--
struct PQnode 
  { Item key; PQlink l, r; };
struct pq { PQlink *bq; };
---

Join operation 
  * takes two BQ nodes as input
  * returns one BQ nodes as output
 
--
PQlink pair(PQlink p, PQlink q)
  { PQlink t;
    if (less(p->key, q->key)) 
      { p->r = q->l; q->l = p; return q; }
    else 
      { q->r = p->l; p->l = q; return p; }
  }
---

-----
          Joining binomial queues

Corresponds to adding binary numbers
  * 1 bits correspond to power-of-2 heaps
  * 1+1=10 corresponds to carry


 011


 111


 1010

* lgN time (guaranteed)
* all PQ operations easy to implement with join
-----
          Joining binomial queues (code)

--
  c    b    a      a    c
  0    0    0      a    0
  0    0    1      a    0
  0    1    0      b    0
  0    1    1      0   a+b
  1    0    0      c    0
  1    0    1      0   a+c
  1    1    0      0   b+c
  1    1    1      a   b+c
---

--
#define test(C, B, A) 4*(C) + 2*(B) + 1*(A)
void PQjoin(PQlink *a, PQlink *b)
{ int i; PQlink c = z;
  for (i = 0; i < maxBQsize; i++)
    switch(test(c != z, b[i] != z, a[i] != z))
    {
    case 2: a[i] = b[i]; break;
    case 3: c = pair(a[i], b[i]); 
            a[i] = z; break;
    case 4: a[i] = c; c = z; break;
    case 5: c = pair(c, a[i]); 
            a[i] = z; break;
    case 6: 
    case 7: c = pair(c, b[i]); break;
    }
}
---
-----
          Binomial queues summary

BQ of size N is array of power-of-two heaps
  one for each bit in binary rep. of N

Joining two BQs is like adding binary numbers
  * insert is like incrementing
  * delete, delmax are like decrementing
  * heap-like promotion, demotion works
    for "change priority"

All operations guaranteed lgN

Actually faster, in "amortized" sense

Example: PQinsert N items, then one more
  * N even, just insert item
  * N = ...01, just two steps
  * N = ..011, just three steps
    ...
  total cost LINEAR
    N/2 + 2(N/4) + 3(N/8) + 4(N/16) + ...

Basis for modifications getting most operations
  down to constant worst-case time 
  (some of which are practical)

Good candidate for "system" PQ routine          COS 226 Lecture 6: Radix sorting


Bits and digits

Binary Quicksort

MSD radix sort

Three-way radix Quicksort

LSD radix sort

Sorting in linear time


-----
          Bits and digits

Extracting bits easy in C

Radix: base of number system

Power of 2 radix: groups of bits
  binary (radix-2):       1 bit at a time     
  hexadecimal(radix-16):  4 bits at a time
  ascii(radix-256):       8 bits at a time

--
  bin   01100001011000100110001101100100
  hex      6   1   6   2   6   3   6   4
  ascii        a       b       c       d
---

Easy to extract digits with macros
--
#define bitsword 32 
#define bitsbyte 8  
#define bytesword 4
#define R (1 << bitsbyte)
#define digit(A, B) 
  ((A >> (bitsword-(B+1)*bitsbyte)) & (R-1))
---

Ex: x = 0X61626364
  digit(x, 2) = (x >> 8) & 255 = c

Ex: Single-bit access: bitsbyte = 1
  digit(x, 11) = (x >> 20) & 1 = 0

-----
          Binary Quicksort

Partition file into two pieces
  * all keys with first bit 0
  * all keys with first bit 1
Sort two pieces recursively

--
quicksortB(int a[], int l, int r, int w)
{ int i = l, j = r;
  if (r <= l || w > bitsword) return;
  while (j != i)
  { 
    while (digit(a[i], w) == 0 && (i < j)) i++;
    while (digit(a[j], w) == 1 && (j > i)) j--;
    exch(a[i], a[j]);
  }
  if (digit(a[r], w) == 0) j++;
  quicksortB(a, l, j-1, w+1);
  quicksortB(a, j, r, w+1);
}
---

Equivalent to partitioning 
  on the VALUE 2^(bitsword-w+1)
  instead of some key in the file.
Bad partition if all keys have same leading bit
  one subfile of size N, BUT keys one bit shorter

-----
          Binary Quicksort (continued)


Problems: 
  leading 0 bits
  cost of inner loop
    could be advantage if carefully done

Worst case: all keys equal
  32N passes on a 32-bit machine
  64N passes on a 64-bit machine

Random bits?
  should sort out after lgN bits examined
Nonrandom bits?
  take bigger chunks
-----
          MSD radix sort

Partition file into M buckets
  * all keys with first byte 0
  * all keys with first byte 1
  * all keys with first byte 2
    ...
  * all keys with first byte M-1
Sort M pieces recursively

Take M=2^bitsbyte

Tradeoff
  large M
    space for buckets
    (too many empty buckets)
  small M 
    too many passes
    (too many keys per bucket)

Upper bound on running time: 
  (bitsword/bitsbyte)*N
(Worst case: all keys equal)

32 bits, 8 bits/byte 8: 4N

100-byte keys: could be 100N

/lines 40 def
-----
          MSD radix sort example


--
  now  a|ce  ac|e  ace|
  for  a|go  ag|o  ago|
  tip  a|nd  an|d  and|
  ilk  b|et  be|t  bet|
  dim  c|ab  ca|b  cab|
  tag  c|aw  ca|w  caw|
  jot  c|ue  cu|e  cue|
  sob  d|im  di|m  dim|
  nob  d|ug  du|g  dug|
  sky  e|gg  eg|g  egg|
  hut  f|or  fe|w  fee|
  ace  f|ee  fe|e  few|
  bet  f|ew  fo|r  for|
  men  g|ig  gi|g  gig|
  egg  h|ut  hu|t  hut|
  few  i|lk  il|k  ilk|
  jay  j|am  ja|y  jam|
  owl  j|ay  ja|m  jay|
  joy  j|ot  jo|t  jot|
  rap  j|oy  jo|y  joy|
  gig  m|en  me|n  men|
  wee  n|ow  no|w  nob|
  was  n|ob  no|b  now|
  cab  o|wl  ow|l  owl|
  wad  r|ap  ra|p  rap|
  caw  s|ob  sk|y  sky|
  cue  s|ky  so|b  sob|
  fee  t|ip  ta|g  tag|
  tap  t|ag  ta|p  tap|
  ago  t|ap  ta|r  tar|
  tar  t|ar  ti|p  tip|
  jam  w|ee  wa|d  wad|
  dug  w|as  wa|s  was|
  and  w|ad  we|e  wee|
---
-----
          Key-indexed counting

Method for sorting file on keys with R values
Basis for radix sorts

Basic method
  * count number of keys with each value
  * take sums to turn counts into indices
  * move keys to auxiliary array using indices

--
void keycount(int a[], int l, int r)
{ int i, j, cnt[R+1]; 
  int b[maxN];
  for (j = 0; j <  R; j++) cnt[j] = 0;
  for (i = l; i <= r; i++) cnt[a[i]+1]++;
  for (j = 1; j <  R; j++) cnt[j] += cnt[j-1];
  for (i = l; i <= r; i++) 
    b[cnt[a[i]]++] = a[i];
  for (i = l; i <= r; i++) a[i] = b[i];
} 
---

Keys must be integers (in a small range)!
Need one counter for each different key value

-----
          Key-indexed counting example


-----
          MSD radix sort code

Modify key-indexed counting to access
  bytes as keys
Add a loop to do recursive calls

-- 
#define bin(A) l+count[A]
void radixMSD(Item a[], int l, int r, int w)
  { int i, j, count[R+1]; 
    if (w > bytesword) return;
    if (r-l <= M) { insertion(a, l, r); return; }
    for (j = 0; j < R; j++) count[j] = 0;
    for (i = l; i <= r; i++) 
      count[digit(a[i], w) + 1]++;
    for (j = 1; j < R; j++) 
      count[j] += count[j-1];
    for (i = l; i <= r; i++) 
      b[l+count[digit(a[i], w)]++] = a[i];
    for (i = l; i <= r; i++) a[i] = b[i];
    radixMSD(a, l, bin(0)-1, w+1);
    for (j = 0; j < R-1; j++)
      radixMSD(a, bin(j), bin(j+1)-1, w+1);
  }
---

Variable-length keys terminated with 0 (C strings)
  * remove first if statement
  * remove first recursive call

-----
          Recursive structure of MSD radix sort

Tree structure to describe recursive call
Paths in tree give keys


Problem: algorithm touches empty nodes
Tree can be as much as M times bigger


Could use linked lists for nonempty buckets
Better solution: 
  3-way Quicksort partition on bytes

-----
          3-way radix Quicksort code

--
#define ch(A) digit(A, D)
void quicksortX(Item a[], int l, int r, int D)
{ 
  int i, j, k, p, q; int v;
  if (r-l <= M) { insertion(a, l, r); return; }
  v = ch(a[r]); i = l-1; j = r; p = l-1; q = r;
  while (i < j)
  { 
    while (ch(a[++i]) < v) ;
    while (v < ch(a[--j])) if (j == l) break;
    if (i > j) break;
    exch(a[i], a[j]);
    if (ch(a[i])==v) { p++; exch(a[p], a[i]); }
    if (v==ch(a[j])) { q--; exch(a[j], a[q]); }
  }
  if (p == q) 
    { if (v != '\\0') quicksortX(a, l, r, D+1); 
      return; }
  if (ch(a[i]) < v) i++;
  for (k = l; k <= p; k++, j--) exch(a[k], a[j]);
  for (k = r; k >= q; k--, i++) exch(a[k], a[i]);
  quicksortX(a, l, j, D);
  if ((i == r) && (ch(a[i]) == v)) i++;
  if (v != '\\0') quicksortX(a, j+1, i-1, D+1); 
  quicksortX(a, i, r, D); 
}
---

/lines 40 def
-----
          3-way radix Quicksort example


--
  now  gig   ace   ago   a|go
  for  for   bet   bet   a|ce
  tip  dug   dug   and   a|nd
  ilk  ilk   cab   ace   b|et
  dim  dim   dim   c|ab
  tag  ago   ago   c|aw
  jot  and   and   c|ue
  sob  fee   egg   egg
  nob  cue   cue   dug
  sky  caw   caw   dim 
  hut  hut   f|ee
  ace  ace   f|or
  bet  bet   f|ew
  men  cab   ilk
  egg  egg   gig
  few  few   hut 
  jay  j|ay  ja|m
  owl  j|ot  ja|y
  joy  j|oy  jo|y
  rap  j|am  jo|t
  gig  owl   owl   m|en
  wee  wee   now   owl
  was  was   nob   nob
  cab  men   men   now 
  wad  wad   r|ap
  caw  sky   sky   sky   sky
  cue  nob   was   tip   sob 
  fee  sob   sob   sob   t|ip  ta|r
  tap  tap   tap   tap   t|ap  ta|p  
  ago  tag   tag   tag   t|ag  ta|g
  tar  tar   tar   tar   t|ar  ti|p
  dug  tip   tip   w|as
  and  now   wee   w|ee
  jam  rap   wad   w|ad
---

/lines 40 def
-----
          LSD radix sort

Ancient (older than computers) method
  used for card-sorting

Consider digits from right to left
  use key-indexed counting (has to be stable)

--
void radixLSD(Item a[], int l, int r)
{
  int i, j, w, count[R+1]; 
  for (w = bytesword-1; w >= 0; w--)
  {
    for (j = 0; j < R; j++) count[j] = 0;
    for (i = l; i <= r; i++) 
      count[digit(a[i], w) + 1]++;
    for (j = 1; j < R; j++) 
      count[j] += count[j-1];
    for (i = l; i <= r; i++) 
      b[count[digit(a[i], w)]++] = a[i];
    for (i = l; i <= r; i++) a[i] = b[i];
  }
} 
---

Running time: N*(bitsword/bitsbyte)
Disadvantage: 
  doesn't work for variable-length keys
  totally out of order until MSD encountered
/lines 42 def
-----
          LSD radix sort example


--
  now  so|b  c|ab  |ace
  for  no|b  w|ad  |ago
  tip  ca|b  t|ag  |and
  ilk  wa|d  j|am  |bet
  dim  an|d  r|ap  |cab
  tag  ac|e  t|ap  |caw
  jot  we|e  t|ar  |cue
  sob  cu|e  w|as  |dim
  nob  fe|e  c|aw  |dug
  sky  ta|g  r|aw  |egg
  hut  eg|g  j|ay  |fee
  ace  gi|g  a|ce  |few
  bet  du|g  w|ee  |for
  men  il|k  f|ee  |gig
  egg  ow|l  m|en  |hut
  few  di|m  b|et  |ilk
  jay  ja|m  f|ew  |jam
  owl  me|n  e|gg  |jay
  joy  ag|o  a|go  |jot
  rap  ti|p  g|ig  |joy
  gig  ra|p  d|im  |men
  wee  ta|p  t|ip  |nob
  was  fo|r  s|ky  |now
  cab  ta|r  i|lk  |owl
  wad  wa|s  a|nd  |rap
  tap  jo|t  s|ob  |raw
  caw  hu|t  n|ob  |sky
  cue  be|t  f|or  |sob
  fee  yo|u  j|ot  |tag
  raw  no|w  y|ou  |tap
  ago  fe|w  n|ow  |tar
  tar  ca|w  j|oy  |tip
  jam  ra|w  c|ue  |wad
  dug  sk|y  d|ug  |was
  you  ja|y  h|ut  |wee
  and  jo|y  o|wl  |you
---

-----
          Binary LSD radix sort example

Cannot use Quicksort-style partitioning
  0-1 sort has to be stable
  stable inplace 0-1 sort? (possible, but not easy)


Proof that it works:
  if two keys differ on first bit
    0-1 sort puts them in proper relative order
  if two keys agree on first bit
    stability keeps them in proper relative order

Another proof that it works:
  if the bits not yet examined differ
    doesn't matter what we do now
  if the bits not yet examined agree
    later pass won't affect their order
-----
          Linear sorting method

LSD radix sort!
To sort N 64-bit keys take bitsbyte=16
  4N steps, linear extra memory (plus 2^16)

Does not violate NlgN lower bound because
  comparisons are not used

LSD radix sort liabilities
  inner loop has a lot of instructions
  accesses memory "randomly"
  wastes time on low-order bits
Therefore, use just "enough" bits

MSD radix sort also linear
Use LSD-MSD hybrid for random keys
  (assume fixed-size keys)
  use (lgN)/2 < bitsbyte < lgN

Three passes 
  LSD radix sort on 2nd byte
  LSD radix sort on 1st byte
  insertion sort to clean up


          COS 226 Lecture 7: Searching overview

Abstract Data Types Revisited

Symbol Table, Dictionary
  items with keys
  INSERT
  SEARCH

Searching = "implement a symbol table ADT"

ADT orientation integral to modern languages
implementation in C possible
Refs: Sedgewick section 4.8
  Hanson, "C Interfaces and Implementations"
         

COS 217:
  ADTs control complexity in large systems
COS 226:
  ADTs precisely define implementation
Both:
  ADTs help us build layers of abstraction
  
-----
          Items and keys

ADT for use by clients and implementations
  of symbol table ADTs
Sample INTERFACE for char items and keys

--
.  typedef char Item;
.  typedef char Key;
.  #define key(A) A
.  #define less(A, B)  (A < B)
.  #define eq(A, B)  ((A) == (B))
.  #define maxKey 256
.  #define NULLitem 0

.  Item ITEMrand(void);
.   int ITEMscan(Item *);
.  void ITEMshow(Item);
---

CLIENTS
  use functions to create and write keys
  manipulate items
Search algorithms (ST IMPLEMENTATION)
  store and retrieve items
  use "key", "less" to extract, compare keys

"key" and "less" could be in implementation

GOAL: implementations independent of item type
-----
          Symbol table ADT

Items with keys 

Two basic operations
  * insert
  * search (find item with given key)
generic operations common to many ADTs
  * create
  * test if empty
  * destroy (often ignored if not harmful)  
other operations
  * construct (batch inserts)
  * sort (often a byproduct)
  * delete
  * find kth largest
  * join
(ST > PQ)

INTERFACE for basic and generic operations
--
.       void STinit(int);
.       void STinsert(Item);
.       Item STsearch(Key);
.        int STcount();
---
Could combine with datatype interface

Should also specify 
  constraints and error conditions

-----
          Typical brute-force implementation

Use linear search in an ordered array

--
static Item *st;
static int N;
void STinit(int maxN) 
  { st = malloc((maxN)*sizeof(Item)); N=0; }
int STcount() 
  { return N; }
void STinsert(Item item)
  { int j = N++; Key v = key(item);
    while (j>0 && less(v, key(st[j-1])))
      {  st[j] = st[j-1]; j--; }
    st[j] = item;
  }
Item STsearch(Key v)
  { int j;
    for (j = 0; j < N; j++)
      { if (eq(v, key(st[j]))) return st[j];
        if (less(v, key(st[j]))) break; }
    return NULLitem;
  }
---

Other options for brute-force implementations
  * unordered array
  * unordered singly-linked list
  * ordered singly-linked list
-----
          Binary search

Maintain ordered array
Do search in lgN steps by divide-and-conquer

--
Item search(int l, int r, Key v)
  { int m = (l+r)/2;
    if (l > r) return NULLitem;
    if eq(v, key(st[m])) return st[m];
    if (l == r) return NULLitem;
    if less(v, key(st[m])) 
        return search(l, m-1, v);
    else return search(m+1, r, v);
  }
Item STsearch(Key v)
  { return search(0, N-1, v); }
---

Ex: search for I
--
.   0  1  2  3  4  5  6  7  8  9 10 11 12
.      A  A  E  G  I  M  N  O  R  S  T  X
---
--
.           l    r   m
.           1   12   6
.           1    5   3
.           4    5   4
.           5    5
---
Worst-case number of steps for binary search =
  number of bits in binary representation of N
Proof: they satisfy the same recurrence

-----
          Interpolation search

Maintain ordered array
Improve upon binary search by using key values
  to guess where to look

Ex: search for I
  (I-A)/(X-A) = (9-1)/(24-1) = .35
  therefore divide at .35, not .5
  next probe will be on or close
  
--
.   0  1  2  3  4  5  6  7  8  9 10 11 12
.      A  A  E  G  I  M  N  O  R  S  T  X
---

--
.           l    r   m
.           1   12   5
---

THM: Interpolation search takes lglgN steps
  for random files
Proof: difficult

Less than 5 probes in practice (2^2^5 = billions)

Caveats: 
  files not random
  calculations cost more than comparisons

-----
          ST implementations cost summary

Per-operation time costs as function of ST size

--
.                worst-case       average-case       
.              insert  search    insert  search  
. ordered                      
.   array         N       N        N/2     N/2       
.   list          N       N        N/2     N/2       
. unordered                    
.   array         1       N         1      N/2       
.   list          1       N         1      N/2
  
. binary search   N      lgN       N/2     lgN

.   BST           N       N        lgN     lgN     
. red-black      lgN     lgN       lgN     lgN
. randomized     (N)     (N)       lgN     lgN
.  hashing        N       N         1       1
---

Situation further complicated by
  * need to have many other operations
  * operations mix
  * space costs
Another algorithm design success story:
  * nearly optimal worst-case cost
  * fast average-case cost
  * simple (but ingenious!) algorithms

Costs lower in practice
  average-case analysis
  amortized analysis

-----
          First-class symbol-table ADT

Basic operations
  * initialize
  * insert
  * search
  * select kth smallest
  * delete
  * join two STs
  * test if empty
  * visit items in order
Handles (pointers to unspecified structs)
  for delete, client needs to handle recs
  for join, client needs to handle STs
INTERFACE for basic operations
--
.  typedef struct st* ST;
.  typedef struct STnode* STlink;
.       ST STinit(int);
.      int STcount(int);
.   STlink STinsert(ST, Item);
.     Item STsearch(ST, Key);
.     Item STselect(ST, int);
.     void STdelete(ST, STlink);
.       ST STjoin(ST, ST);
---
could have STsearch and STselect return handles

-----
          First-class ADT implementation 

Concrete implementation of abstract ADT
  specify data representation for records
    [ STlink is pointer to record ]
  specify data representation for table
    [ ST is pointer to symbol table ]

Ex: Binary search trees
--
struct STnode
  { Item item; struct STlink l, r; };
struct st
  { STlink head; }
---
  "What is a record in a BST?"
    a node
    (handle is pointer to record)
  "What is a symbol table?" 
    a pointer to a node (root)
    (handle is pointer to struct
      with pointer to node)
   
Providing true "abstract" data type challenging
* precise ADT specification worth the trouble
* new implementations are "plug-ins"
* compiler type-checking can be helpful
* not so challenging in some modern languages

More details: Sedgewick, sections 4.8, 9.5
-----
          Interface for standalone ADT

Use if only one ADT in application (typical)

--
.      void STinit(int);
.       int STcount();
.      void STinsert(Item);
.      Item STsearch(Key);
.      void STdelete(Item);
.      Item STselect(int);
.      void STsort(void (*visit)(Item));
---

  * elimate ST handles in args
  * can specify DELETE in terms of item types
  * can specify JOIN in terms of ST types

Stylized code setup gives true ADT
 (built-in support in C++, Java ...)

Standalone ADT
  useful in most applications
  allows us to focus on algorithms

-----
          Symbol table ADT client example

DEDUP a file of integers (remove duplicates)

--
#include <stdio.h>
#include <stdlib.h>
#include "Item.h"
#include "ST.h"
void main()
 { 
    int v; Item item;
    STinit(maxN);
    while (scanf("%d", &v) == 1)
      {
        if (STsearch(v) != NULLitem) continue;
        key(item) = v;
        STinsert(item); 
        printf("%d\n", v);
      }
 }
---
Goal: independence from implementation
Space costs: same options as for PQs
  copy of data for implementation
  implementation manipulates data via indices
  implementation manipulates pointers
Examples of other clients:
  online dictionary
  compiler symbol table
-----
          BST data types

NODES consist of
  item with key
  left link to BST for smaller keys
  right link to BST for larger keys
  count of number of nodes in BST (optional) 

--
typedef struct STnode* link;
struct STnode 
  { Item item; link l, r; int N };
static link head, z;
link NEW(Item item, link l, link r, int N)
  { link x = malloc(sizeof *x); 
    x->item = item; 
    x->l = l; x->r = r; x->N = N;
    return x;
  }
void STinit()
  { head = (z = NEW(NULLitem, 0, 0, 0)); }
int STcount() { return head->N; }
---

-----
          Recursive BST implementation

Search and insertion code directly
  follows from BST definition

--
Item searchR(link h, Key v)
  { Key t = key(h->item);
    if (h == z) return NULLitem;
    if eq(v, t) return h->item;
    if less(v, t) 
        return searchR(h->l, v);
    else return searchR(h->r, v);
  }
Item STsearch(Key v) 
  { return searchR(head, v); } 
link insertR(link h, Item item)
  { Key v = key(item), t = key(h->item);
    if (h == z) return NEW(item, z, z, 1);
    if less(v, t) 
        h->l = insertR(h->l, item);
    else h->r = insertR(h->r, item);
    (h->N)++; return h;
  }
void STinsert(Item item)
  { head = insertR(head, item); }
---

-----
          Nonrecursive BST implementation

Travel down the tree
  move left if smaller than search key
  move right if larger than search key

--
STinsert(Item item)
  { Key v = key(item);
    link p = head, x = p;
    if (head == z) 
      { head = NEW(item, z, z); return; }
    while (x != z)
      { p = x; 
        if less(v, key(x->item)) 
            x = x->l;
        else x = x->r;
      }
    x = NEW(item, z, z, 1);
    if less(v, key(p->item)) 
        p->l = x;
    else p->r = x;
  }
---

Equivalent to recursive version
  (can derive automatically)
Recursive versions of tree algs
  simpler to implement and understand
-----
          BST insertion example


-----
          BST construction


-----
          Cost of search in BSTs

One-to-one correspondence 
  between BST and Quicksort partitioning

Total cost of searching for all nodes
  satisfies same recurrence as Quicksort

 C(N) = N+1 + 2( C(1)  +  C(2) + ... + C(N-1) )/N
      ...
      = 2N ln N

THM: Search and insertion in BSTs involves 
examining 2lnN nodes on the average.

Worst case: nodes in order
degenerates to linked list, insertion cost N

-----
          Other operations in BSTs

SORT: traverse tree in inorder
--
void sortR(link h, void (*visit)(Item))
  { 
    if (h == z) return;
    sortR(h->l, visit);
    visit(h->item); 
    sortR(h->r, visit);
  }
void STsort(void (*visit)(Item))
  { sortR(head, visit); } 
---
same cost as Quicksort (+ space for links)
Useful as "free" ST operation

GENERALIZED PQ: add tree size to nodes
--
Item selectR(link h, int k)
  { int t = h->l->N;
    if (h == z) return NULLitem;
    if (t > k ) return selectR(h->l, k);
    if (t < k ) return selectR(h->r, k-t-1);
    return h->item;
  }
Item STselect(int k)
  { return selectR(head, k); } 
---
special case: go left for smallest
  
Delete arbitrary node? Join?

-----
          Deletion in BSTs

To delete a node at the bottom (A E L P X)
  remove it
To delete a node with one child (A A C G I M)
  pass the child up
To delete a node with two children (S E H N)
  find the "next" node
    use right-left*
    OR  left-right*
  swap, then remove (reduces to A or B)
  

Problem: 
  strategy not symmetric
  clumsy algorithm (see code in book!)
Serious problem: trees not random after long 
  sequence of insertions and deletions (!!)
-----
          BSTs by root insertion

Idea: Insert such that new node stays at root


Nonrecursive implementation possible
  pass link that would cross to parent
Recursive implementation easy
  based on ROTATION

Faster search for recently inserted nodes
-----
          Nonrecursive BST root insertion

--
void BSTinsert(Item item)
  { Key v = key(item); int lr;
    struct BSTnode **t, **u, *x;
    x = NEW(rec, z, z);
    if (head == z) return x;
    if (less(v, key(head->item))) 
      { x->r = head; head = x; t = &x->l; 
                        x = x->r; lr = 0; }
    else
      { x->l = head; head = x; t = &x->r; 
                        x = x->l; lr = 1; }
    while (x != z)
      if (less(v, key(x->item))) 
      { if ( lr) { lr = !lr; *t = *u; t = u; } 
        u = &x->l; x = *u; }
      else
      { if (!lr) { lr = !lr; *t = *u; t = u; } 
        u = &x->r; x = *u; }
    *t = z;
    return head;
  }
---

Challenge for the bored:
  check that this code is correct
Conclusion for the rest of us:
  recursive implementations *are* simpler!

-----
          ROTATE operation in trees

Fundamental operation 
  rearrange nodes in trees
  local transformation (two links)
  maintain sorted order

--
link rotR(link h)
  { link x = h->l; h->l = x->r; x->r = h; 
    return x; }
link rotL(link h)
  { link x = h->r; h->r = x->l; x->l = h; 
    return x; }
---

Right rotate                     Left rotate

-----
          Recursive BST root insertion code

To insert at root
  insert at root of subtree (recursively)
  rotate to bring inserted key to root

--
link insertT(link h, Item item)
  { Key v = key(item);
    if (h == z) return NEW(item, z, z, 1); 
    if (less(v, key(h->item))) 
      { h->l = insertT(h->l, item); 
        h = rotR(h); }
    else
      { h->r = insertT(h->r, item); 
        h = rotL(h); }
    return h;
  }
void STinsert(Item item)
  { head = insertT(head, item); }
---

Equivalent to 
  insert the node
  use rotations to bring it up to the top

-----
          Rotate to the root


-----
          Top-down BST construction (root insertion)


-----
          Searching (to be continued)

SUMMARY
  use elementary methods for small cases
  binary search guarantees lgN steps for search
    (but requires N steps for insertion)
  interpolation search can speed up search
    (but requires N steps for insertion)
  binary search trees give fast
    search and insert, average-case

Binary search trees can do more
  * insert
  * search
  * delete
  * select kth smallest
  * ...
all in O(lgN) steps, average-case
with O(NlgN) sort as a byproduct

Goal: Symbol table implementation
  with O(lgN) GUARANTEED performance
  for both search and insert
          COS 226 Lecture 8: Balanced trees

Symbol Table, Dictionary
  records with keys
  INSERT
  SEARCH

Goal: Symbol table implementation
  with O(lgN) GUARANTEED performance
  for both search and insert
  (and other ST operations)

Approach 1: 
  probabilistic "guarantee"
Approach 2: 
  amortized "guarantee"
Approach 3: 
  worst-case guarantee

-----
          Randomized BSTs

Idea: new node should be at root with
  probability 1/(N+1), so put it there!

--
link insertR(link h, Item item)
  { Key v = key(item), t = key(h->item);
    if (h == z) return NEW(item, z, z, 1);
    if (rand()< RAND_MAX/(h->N+1)) 
      return insertT(h, item);
    if less(v, t) h->l = insertR(h->l, item);
             else h->r = insertR(h->r, item);
    (h->N)++; return h;
  }
void STinsert(Item item)
  { head = insertR(head, item); }
---

Trees have same shape as random BSTs
  ** no matter what the input **

Random BSTs are well studied
  exponentially small chance of bad balance
-----
          Randomized BST example

Insert keys in order: tree shape random!


-----
          Other operations in randomized BSTs
          
Find kth largest
  another use of size field already there
Join disjoint STs
  straightforward recursive implementation
  to join STs A (of size M) and B (of size N)
    use A root with probability M/(M+N)
    use B root with probability N/(M+N)
    join other tree with subtree recursively
Delete
  remove the node, do join (above)
THM: Trees still random after delete (!!)


Bottom line:
  randomized BSTs provide full symbol table ADT
  straightforward implementation
  O(log N) average case
  bad cases provably unlikely
-----
          Skip lists

Idea: Add fast tracks to linked lists


Nodes have variable number of links

--
typedef struct STnode* link;
struct STnode { Item item; link* next; int sz; };
link NEW(Item item, int k)      
  { int i; link x = malloc(sizeof *x);
    x->next = malloc(k*sizeof(link));
    x->item = item; x->sz = k;
    for (i = 0; i < k; i++) x->next[i] = z;
    return x;                         
  }                                   
---

Problems:
  how to maintain structure under insertion
  how many links in a particular node?

-----
          Skip list implementation

Idea: put a RANDOM number of links in each node
  j links with probability 1/2^j
--
void STinit(int max) 
  { 
    N = 0; lgN = 0; 
    z = NEW(NULLitem, 0); 
    head = NEW(NULLitem, lgNmax); 
  }
void insertR(link t, link x, int k)
  { Key v = key(x->item); 
    if (less(v, key(t->next[k]->item))) 
      { 
        if (k < x->sz)
          { x->next[k] = t->next[k]; 
            t->next[k] = x; }
        if (k == 0) return;
        insertR(t, x, k-1); return;
      }
    insertR(t->next[k], x, k);
  }
void STinsert(Key v)
  { insertR(head, NEW(v, randX()), lgN); N++; }
---
Bottom line:
  same properties as randomized BSTs
  plus: easier to understand
  minus: more pointer-chasing
-----
          Splay trees

Idea: slight modification to root insertion
Check two links above current node    

Orientations differ: same as root insertion
Orientations match: do top rotation first


Brings new node to root
*Also* brings all nodes on search path
  closer to root
-----
          Splay tree example

Splay rotations halve the search path

Long paths can build up, BUT
  tree rebalances whenever we traverse them


-----
          Splay tree implementation

--
link splay(link h, Item item)
{ Key v = key(item);
  if (h == z) return NEW(item, z, z, 1);  
  if (less(v, key(h->item))) 
  { 
    if (hl == z) return NEW(item, z, h, h->N+1);
    if (less(v, key(hl->item))) 
      { hll = splay(hll, item); h = rotR(h); }
    else 
      { hlr = splay(hlr, item); hl = rotL(hl); }
    return rotR(h); 
  }
  else
  { 
    if (hr == z) return NEW(item, h, z, h->N+1);
    if (less(key(hr->item), v)) 
      { hrr = splay(hrr, item); h = rotL(h); }
    else 
      { hrl = splay(hrl, item); hr = rotR(hr); }
    return rotL(h);
  }
}
---

Guaranteed performance over SEQUENCE of ops
-----
          2-3-4 trees

Allow one, two, or three keys per node
Keep link for every interval beteen keys
  2-node: one key, two children
  3-node: two keys, three children
  4-node: three keys, four children


SEARCH
  compare search key against keys in node
  find interval containing search key
  follow associated link (recursively)

INSERT
  search to bottom for key
  2-node at bottom: convert to a 3-node
  3-node at bottom: convert to a 4-node
  4-node at bottom: ??
-----
          Top-down 2-3-4 trees

Transform tree on the way down
  to ensure that last node is not a 4-node


"Split" any 4-nodes encountered using
  one of two transformations


Invariant: "current" node is not a 4-node
Insertion at bottom therefore easy.
-----
          Top-down 2-3-4 tree construction


Trees grow up from the bottom
-----
          Balance in 2-3-4 trees


In top-down 2-3-4 trees,
  all paths from top to bottom 
  are equal length (tree height)


Tree height:
  worst case: lgN (all 2-nodes)
  best case: lgN/2 (all 4-nodes)
  between 10 and 20 for a million nodes  
  between 15 and 30 for a billion nodes  

Comparisons within nodes not accounted for

-----
          Top-down 2-3-4 tree implementation

Fantasy code (sketch)

--
ST BSTinsert(ST head, recordType rec)
  { Key v = key(rec);
    struct BSTnode *x = head->r;
    while (x != z)
      { 
        x = therightlink(x, v); 
        if fourNode(x) then split(x);
      }
    if twoNode(x) then makeThree(x, v); else
    if threeNode(x) then makeFour(x, v); else
    return head;
  }
---

Direct implementation complicated because of
* "therightlink(x, v)"
* maintaining multiple node types
* large number of cases for "split"

Search also more complicated than for BST
-----
          Red-black trees

Represent 2-3-4 trees as binary trees
  with "internal" edges for 3- and 4-nodes


Correspondence between 2-3-4 and RB trees
Not 1-1 because 3-nodes swing either way

SEARCH:
  use plain BST search (!)
  tree balance gives 
    O(lgN) performance guarantee
  all comparisons accounted for

INSERT:
  split reduces to fewer cases

-----
          Splitting nodes in red-black trees

Two cases are easy
  (need only to switch colors)


Two cases require ROTATIONS


Can use red-black abstraction directly
Invariant: never two consecutive red nodes
  on any path.
-----
          RB tree node split example


-----
          Red-black tree implementation

Search code same as BST; insert code balances

--
link RBinsert(link h, Item item, int sw)
{ Key v = key(item);
  if (h == z) return NEW(item, z, z, 1, 1);  
  if ((hl->red) && (hr->red)) 
    { h->red = 1; hl->red = 0; hr->red = 0; }
  if (less(v, key(h->item))) 
    { 
      hl = RBinsert(hl, item, 0); 
      if (h->red && hl->red && sw) h = rotR(h); 
      if (hl->red && hll->red) 
        { h = rotR(h); h->red = 0; hr->red = 1; }
    }
  else
    { 
      hr = RBinsert(hr, item, 1); 
      if (h->red && hr->red && !sw) h = rotL(h); 
      if (hr->red && hrr->red) 
        { h = rotL(h); h->red = 0; hl->red = 1; }
    }
  return h;
}
void STinsert(Item item)
{ head = RBinsert(head, item, 0); head->red = 0; }
---
-----
          Red-black tree construction


-----
          Balance in red-black trees


In red-black trees,
  the longest path is at most
  twice the length of the shortest path


worst case: less than 2lgN

Comparisons within nodes are accounted for

-----
          Summary

Goal: Symbol table implementation
  with O(lgN) GUARANTEED performance
  for both search and insert

probablistic guarantee: random BSTs, skip lists
amortized guarantee: splay trees
optimal guarantee: red-black trees
    
Algorithms are varations on a theme
  rotations during insertion

Different abstractions, but equivalent
Ex: skip-list representation of 2-3-4 tree

Are balanced trees optimal?
  worst-case: no (can get ClgN for C>1)
  average-case: open

Extensions to search structures for huge files
  (stay tuned)
          COS 226 Lecture 9: Hashing

Symbol Table, Dictionary
  records with keys
  INSERT
  SEARCH

Balanced trees, randomized trees
  use O(lgN) comparisons

Is lgN required?
  (no, and yes)
Are comparisons necessary?
  (no)

----------
          Hashing: basic plan

Save keys in a table, at a location determined
  by the key

HASH FUNCTION
  * method for computing table index from key
COLLISION RESOLUTION STRATEGY
  * algorithm and data structure to handle
    two keys that hash to the same index

Time-space tradeoff

* No space limitation: 
  trivial hash function with key as address
  

* No time limitation: 
  trivial collision resolution: sequential search

 
* Limitations on both time and space
  hashing

----------
          Hash Functions

Short keys (fit in a word)
  treat key as integer
  h(K) = K mod M 
Use prime table size

Ex: four-character keys, table size 101

--
  bin   01100001011000100110001101100100
  hex      6   1   6   2   6   3   6   4
  ascii        a       b       c       d
---

  0x61626364 = 1633831724 
  16338831724 % 101 = 11
  Key "abcd" hashes to 11

  0x64636261 = 1684234849
  1633883172 % 101 = 57
  Key "dcba" hashes to 57

"Collision"
  0x61626263 = 1633837667 
  1633837667 % 101 = 57
  Key "abbc" also hashes to 57

Obvious point:
  millions of keys, small table: most collide!
----------
          Hash Functions (continued)

Long keys (strings)
  treat key as integer
  h(K) = K mod M 

Multiprecision arithmetic calculation
Use Horner's method

Ex: (check with 4 chars; works for any length)
--
  hex      6   1   6   2   6   3   6   4
  ascii        a       b       c       d
---

  0x61626364 = 256*(256*(256*97+98)+99)+100
  take mod after each multiplication
    256*97+98  = 24930 % 101 = 84
    256*84+99  = 21603 % 101 = 90
    256*90+100 = 23140 % 101 = 11
--
int hash(char *v, int M)
  { int h, a = 117;
    for (h = 0; *v != '\0'; v++) 
      h = (a*h + *v) % M;
    return h;
  }
---
scramble by replacing 256 by 117
Uniform hashing:
  use a different random value for each digit

----------
          Collisions

Table size M

How many insertions until the first collision?

BIRTHDAY PARADOX (classical probability theory)
  Assume hash function "random"
  Expected insertions to first collision

    M    sqrt(pi M/2)
--
.       100        12
.      1000        40
.     10000       125
---

N keys

Option 1: 
  Allow N >> M: put keys hashing to i in a list
  about N/M keys per list

Option 2:
  Keep N < M: put keys somewhere in table
  complex collision pattern

----------
          Collisions (continued)

Experiment 1:
  generate random probes between 0 and 100
  84 35 45 32 89 1 58 16 38 69 5 90 16 16 53 61 ...
  collision at 13th as predicted

Experiment 2:
  use hash function to scatter 4-char keys

--
      bcba 47   dbcb 24
      bddc 43   dcbd 60
      dabc 85   dbab 17
      dbdb 78   bccc  2
      ccad 1    dbdd 80
      babb 74   adbc 31
      cdcc 95   addd 39
      bcbd 50   bcda 55
      bdac 83   abad  4
      baca 26   cada 85
      dabb 84   dabd 86
---

collision after 20 probes
  still as predicted (standard dev. not small)
----------
          Separate chaining

Simple, practical, widely used
Cuts search time by a factor of M 
  over sequential search
Method: M linked lists, one for each table slot

--
.     0:    *
.     1:    L    A    A    A    *
.     2:    M    X    *
.     3:    N    C    *
.     4:    *
.     5:    E    P    E    E    *
.     6:    *
.     7:    G    R    *
.     8:    H    S    *
.     9:    I    *
.    10:    * 
---

Insert cost: 1
Avg. search cost (successful): N/2M
Avg. search cost (unsuccessful): N/M
M large: CONSTANT avg. search time
  independent of how keys are distributed (!)
Worst case: N  ("probabilistically" unlikely)
Keep lists sorted?
  increases insert time to N/2M
  cuts unsuccessful search time to N/2M

----------
          Linear Probing

No links, keep everything in table
Method:  start linear search at hash position
  (stop when empty position hit)
Still get O(1) avg. search time if table sparse
--
void STinit(int max)
  { int i; 
    N = 0; M = 2*max;
    st = malloc(M*sizeof(Item));
    for (i = 0; i < M; i++) st[i] = NULLitem; 
  }
void STinsert(Item item)
  { Key v = key(item);
    int i = hash(v, M);
    while (!null(i)) i = (i+1) % M;
    st[i] = item; N++;
  }
Item STsearch(Key v)
  { int i = hash(v, M);
    while (!null(i))
      if eq(v, key(st[i])) return st[i]; 
      else i = (i+1) % M;
    return NULLitem;
  }
---
Very sparse table: like separate chaining
As table fills up: CLUSTERING occurs
  (infinite loop on full table)
----------
          Linear probing performance


CLUSTERING
  bad phenomenon: items clump together
  long clusters tend to get longer
  avg. search cost grows to M as table fills
Precise analysis very difficult.
THM (Knuth)
  Insert cost: approx. (1+ 1/(1-N/M)^2)/2
  Search cost (hit): approx. (1+ 1/(1-N/M))/2
  Search cost (miss): same as insert
Too slow when table gets 70%-80% full
----------
          Double Hashing

Avoid clustering by using 2nd hash
  to compute increment for seq. search


Analysis extremely difficult
Roughly equivalent to ideal (random probe)
THM (Guibas-Szemeredi)
  Insert cost: approx. 1/(1-N/M)
  Search cost (hit): approx. ln(1+N/M)/(N/M)
  Search cost (miss): same as insert
Not too slow until table gets 90%-95% full
----------
          Amortized analysis of algorithms
          
Measure running time for X operations by
  (total cost of all X operations)/ X
Ex: 
  insert N elements in a heap: 
    (lg1 + lg2 + ... + lgN) / N = lgN + O(1)
  insert N elements in a binomial queue:
    (1*N/2 + 2*N/4 + 3*N/8 +...)/N < 2

Hashing: 
  grow table while keeping search cost O(1)
  when number of keys in table doubles
    rebuild to double the size of the table 
Ex: separate chaining, avg search cost < 2
  4M keys in table of size M
  proof by induction: amortized cost < 2
    cost to build: x*4M
    cost to rebuild to new table size 2M: 4M
    amortized cost of first 8M insertions:
      (x*4M + 4M + 4M)/8M 
      x/2 + 1 < x
Same argument works for other basic ADTs!
Ex: stacks, queues in arrays, double hashing

Another way to evaluate algorithms
  worst-case > average-case > amortized

----------
          Separate chaining vs. double hashing

Assume same amount space for keys, links
  pointers for long or variable-length keys

Space for separate chaining w/ rehashing
  4M keys (or links to keys)
   M table links
  4M links in nodes
  Total space: 9M words for 4M items
  Avg search time: 2
  
Double hashing in same space
  4M items
  table size 9M
  avg search time: 1/(1-4/9) = 1.8
  10% faster
  
Double hashing in same time
  4M items
  avg search time 2
  space needed: 8M words (1/(1-4/8) = 2)
  11% less space

Separate chaining advantages
  idiot-proof (doesn't break)
  no large chunks of memory (is that good?)
  could avoid links to nodes

----------
          Other ST ADT operations 


Deletion
  Separate chaining: trivial
  Linear probing: rehash keys in cluster
    or use indirect method (see below)


  Double hashing: no easy direct method
    mark deleted nodes as "deadwood"
    rebuild periodically to clear deadwood


Sort, find kth largest
  Separate chaining w/ sorted lists
  Linear probing/double hashing
    have to do full sort

Join
  Separate chaining: easy
  Linear probing/double hashing: 
    rehash whole table

----------
          Reasons not to use hashing


Hashing implements ST ADT search and insert
  in constant time.

Why use anything else?

* no performance guarantee

* too much arithmetic on long keys

* takes extra space

* doesn't support all ADT ops efficiently

* compare abstraction works for partial order
  (searching without keys)        

----------
          Other hashing variants

Perfect hashing
  fixed set of keys
  hash function with no collisions
  good hack for small tables
  not practical for large tables
  totally static

Coalesced hashing
  properly account for link space
  mix hash table, storage allocation

Ordered hashing
  cut costs in half as with ordered lists

Brent's variation
  guarantee constant search cost
  up to M insert cost


          COS 226 Lecture 10: Radix trees and tries

Symbol Table, Dictionary
  records with keys
  INSERT
  SEARCH

Balanced trees, randomized trees
  use O(lgN) comparisons
Hashing
  uses O(1) probes
  but time proportional to key length

Is key length required?
  (no)
Are comparisons necessary?
  (no)

Best possible: examine lgN BITS

----------
          Digital search trees (DSTs)

Easy way to balance tree
Use bits of key to direct search
Otherwise identical to BST

--
#define bit(A, B) digit(A, B)
Item searchR(link h, Key v, int w)
  { Key t = key(h->item);
    if (h == z) return NULLitem;
    if eq(v, t) return h->item;
    if (bit(v, w) == 0)
          return searchR(h->l, v, w+1);
    else  return searchR(h->r, v, w+1);
  }
Item STsearch(Key v) 
  { return searchR(head, v, 0); } 
link insertR(link h, Item item, int w)
  { Key v = key(item), t = key(h->item);
    if (h == z) return NEW(item, z, z, 1);
    if (digit(v, w) == 0)
          h->l = insertR(h->l, item, w+1);
    else  h->r = insertR(h->r, item, w+1);
    return h;
  }
void STinsert(Item item)
  { head = insertR(head, item, 0); }
---

----------
          Digital tree insertion


Trees NOT ordered
  can't support SORT and SELECT
----------
          Tries

Branch according to bits in keys
No keys in internal nodes
Records/keys in external nodes


Structure depends on keys, not insertion order
Examine lgN BITS to distinguish key
  independent of key length!
Problems:
  44% space waste from 1-way branching
  multiple node types
----------
          Trie search implementation

Branch according to bits, as in DST
Three outcomes:
  null link
  key in external node matches search key
  key in external node differs from search key

--
#define external(A) ((h->l == z) && (h->r == z))
Item searchR(link h, Key v, int w)
  { Key t = key(h->item);
    if (h == z) return NULLitem;
    if (external(h))
      return eq(v, t) ? h->item : NULLitem;
    if (digit(v, w) == 0)
          return searchR(h->l, v, w+1);
    else  return searchR(h->r, v, w+1);
  }
Item STsearch(Key v) 
  { return searchR(head, v, 0); } 
---

----------
          Trie insertion implementation

Two possible search miss outcomes
  null link: replace with link to new node
  ext. node: recursive split to distinguish new key
--
void STinit()
  { head = (z = NEW(NULLitem, 0, 0, 0)); }
link split(link p, link q, int w)
  { link t = NEW(NULLitem, z, z, 2);
    switch(bit(p->item, w)*2+bit(q->item, w))
    { case 0: t->l = split(p, q, w+1); break;
      case 1: t->l = p; t->r = q; break;
      case 2: t->r = p; t->l = q; break;
      case 3: t->r = split(p, q, w+1); break; }
    return t;
  }
link insertR(link h, Item item, int w)
  { Key v = key(item), t = key(h->item);
    if (h == z) return NEW(item, z, z, 1);
    if (external(h))
      { return split(NEW(item, z, z, 1), h, w); }
    if (digit(v, w) == 0)
          h->l = insertR(h->l, item, w+1);
    else  h->r = insertR(h->r, item, w+1);
    return h;
  }
void STinsert(Item item)
  { head = insertR(head, item, 0); }
---

----------
          Patricia tries

Digression: cute nomenclature

Practical 
Algorithm 
To 
Retrieve 
Information 
Coded 
In 
Alphanumeric

information reTRIEval 
(but pronounced "try")

Patricia:
* collapse one-way branches in tries
* "thread" tree to eliminate multiple node types
Quintessential search algorithm

----------
          Patricia trie search and insertion

SEARCH
  branch according to specified bit


INSERTION
  search, then find leftmost bit that
    distinguishes new key from key found

Easy (external) case


Harder (internal) case


Ingenious algorithm, worth learning (see book)

----------
          R-way digital tree 

Generalize to R links per node
R-way branching gives fast search

--
struct STnode
  { Item item; link next[26]; };
link NEW(Item);

Item searchR(link h, Key v, int w)
  { int i = digit(v, w);
    if (h == z) return NULLitem;
    if eq(v, key(h->item)) return h->item;
    return searchR(h->next[i], v, w+1);
  }
Item STsearch(Key v) 
  { return searchR(head, v, 0); } 
---

Simple, fast
(but uses a lot of space)
----------
          R-way digital tree example


----------
          R-way digital tree analysis

N keys

N internal nodes
R links per node

Space: N*R
Time: lgN/lgR comparisons

Ex: R=26, N=20000
    500,000 links 
    tree height 3-4

Ex: R=16, N=1M
    16M links
    tree height 5

Plus: 
  one node type, easy implementation
Minus: 
  full comparison at each node
  can't support SORT, SELECT

----------
          R-way trie

Generalize trie to R-way branching
Nodes contain characters
R-way branching on next character

--
typedef struct STnode* link;
struct STnode 
  { Item item; link next[R]; };

Item searchR(link h, Key v, int w)
  { int i = digit(v, w);
    if (h == z) return NULLitem;
    if (internal(h)) 
      return searchR(h->next[i], v, w+1);
    if eq(v, key(h->item)) return h->item;
    return NULLitem;
  }
Item STsearch(Key v) 
  { return searchR(head, v, 0); } 
---

Return match on prefix, or make keys prefix-free
  end-of-key character
  suffix trie

----------
          R-way trie example


----------
          R-way trie existence table

No data in trie
Keys encoded in trie structure

Easy to implement
May use excessive space

--
Item searchR(link h, Key v, int w)
  { int i = digit(v, w);
    if (h == z) return NULLitem;
    if (i == NULLdigit) return v;
    return searchR(h->next[i], v, w+1);
  }
Item STsearch(Key v) 
  { return searchR(head, v, 0); } 
---

----------
          R-way trie implementation

Generalize binary trie implementation

--
link split(link p, link q, int w)
  { link t = NEW(NULLitem);
    int pd = digit(p->item, w), 
        qd = digit(q->item, w);
    if (pd == qd)  
          { t->next[pd] = split(p, q, w+1); }
    else  { t->next[pd] = p; t->next[qd] = q; }
    return t;
  }
link insertR(link h, Item item, int w)
  { Key v = key(item);
    int i = digit(v, w);
    if (h == z) return NEW(item);
    if (!internal(h))
      { return split(NEW(item), h, w); }
    h->next[i] = insertR(h->next[i], v, w+1);
    return h;
  }
void STinsert(Item item)
  { head = insertR(head, item, 0); }
---

----------
          R-way trie analysis

Assume 
  one-way links collapsed
  link to remainder of key in external nodes

N keys, total of C characters in keys
approx. N internal nodes
R links per node

Space: N*R + C
Time: lgN/lgR CHARACTER comparisons

Ex: M=26, N=20000
    5R0,000 links 
    tree height 3-4
Ex: R=16, N=1M
    16M links
    tree height 5

Faster than hashing
  particularly for unsuccessful search

Problems
* eliminating 1-way branches
* multiple node types
* too much space for null links
----------
          Correspondence with sorting algs

BSTs correspond to Quicksort recursive structure

Roles of TIME and SPACE interchanged
  TIME:
    path in tree
    size of subfile in sort
  SPACE:
    stack size in sort
    branching factor in tree

Tries correspond to MSD radix sort 


 3-way tries correspond to 3-way Quicksort          

----------
          Three-way radix tries

Nodes contain characters
three-way branching on next character
  left: key character less
  middle: key character equal
  right: key character greater

Equivalent to replacing trie node with
  BST on character


Space savings: can use existence trie

Adds factor of 2lnM to search cost
  constant no. of BYTES for unsucc. search
Easy recursive sort

Most important advantage:
  adapts well to strange keys
Faster than hashing

----------
          Three-way radix search trie example


--
Item searchR(link h, Key v, int w)
  { int i = digit(v, w);
    if (h == z) return NULLitem; 
    if (i == NULLdigit) return v;
    if (i < h->d) return searchR(h->l, v, w);
    if (i == h->d) return searchR(h->m, v, w+1);
    if (i > h->d) return searchR(h->r, v, w);
  }
Item STsearch(char *v)
  { return searchR(head, v, 0); }
---

----------
          Three-way trie insertion implementation

--
typedef struct STnode* link;
struct STnode 
  { Item item; int d; link l, m, r; };
void STinit() { head = z; }
link NEW(int d)
  { link x = malloc(sizeof *x); 
    x->d = d; x->l = z; x->m = z; x->r = z;
    return x;
  }
link insertR(link h, Item item, int w)
  { Key v = key(item);
    int i = digit(v, w);
    if (h == z) h = NEW(i); 
    if (i == NULLdigit) return h;
    if (i < h->d) h->l = insertR(h->l, v, w);
    if (i == h->d) h->m = insertR(h->m, v, w+1);
    if (i > h->d) h->r = insertR(h->r, v, w);
    return h;
  }
void STinsert(Key key)
  { head = insertR(head, key, 0); }
---

----------
          Empirical studies

random 32-bit keys
--
                BUILD              SEARCH
.        bst  dst trie  pat  bst  dst trie  pat
.  5000    4    5    7    7    3    2    3    2
. 12500   18   15   20   18    8    7    9    7
. 25000   40   36   44   41   20   17   20   17
. 50000   81   80   99   90   43   41   47   36
.100000  176  167  269  242  103   85  101   92
.200000  411  360  544  448  228  179  211  182
---

words in Moby Dick
--
                BUILD              SEARCH
.        bst hash  rst fast  bst hash  rst fast
.  5000    5    4    3    3    3    2    2    1
. 12500   11    8    9    7    9    5    5    3
. 25000   23   15   17   13   19   12   10    7
. 50000   50   29   31   25   43   25   21   15
---

library call numbers
--
                BUILD              SEARCH
.        bst hash  rst fast  bst hash  rst fast
.  5000   19   16   21   20   10    8    6    4
. 12500   48   48   54   97   29   27   15   14
. 25000  118   99  188  156   67   59   36   30
. 50000  230  191  333  255  137  113   70   65
---
          COS 226 Lecture 11: External sort/search

Huge file:
  1967: 1 million bytes
  1997: 1 trillion bytes

Use high-level abstraction

  local memory
    random access
    faster sequential access

  external memory
    far slower than local memory
    read/write in large blocks
    faster sequential access
    restrictions on access    
    
Huge file: 
  thousands of times the size of local memory

SORTING
  N: number of records
  M: size of memory
  2P: number of external devices
SEARCHING
  N: number of records
  M: page size
-----
          Algorithms

SORTING
  balanced merge
  polyphase merge

SEARCHING
  indexed sequential
  B-trees
  skip lists
  extendible hashing  

Many ancient algorithms still relevant

-----
          Balanced multiway merge

Make multiple passes over the file
  * can MERGE huge sorted blocks in memory
  * use half the devices for "output"
pass 0: sort the file into sorted blocks of size M
    on devices 0, 1, 2, ... P-1
pass 1: make blocks of size M*P 
  P-way merge out to devices P, P+1, P+2, ... 2*P
pass 2: make blocks of size M*P^2 
  P-way merge out to devices 0, 1, 2, ... P-1
...
pass t: make blocks of size M*P^t 

File is sorted when M*P^t > N
  t = log_P (N/M) passes through the data

--
.           PC sort      workstation
.  file    1 billion      1 trillion
. memory   1 million      1 billion
. devices     2               10
. passes  lg 1000 = 10  log_10 1000 = 3
---
Costs
  no. of passes (no. times each datum touched)
  no. of devices
Bottom line:
  can sort a file in 3-10 times the amount
    of time it takes to read or write it

-----
          Balanced merge example

input
--
T H E Q U I C K B R O W N F X J M P S V L A Z Y D G
---
sorted runs of size 3
--
E H T I Q U B C K O R W F N X J M P L S V A Y Z D G *
---
sorted runs of size 9
--
B C E H I K Q T U F J M N O P R W X A D G L S V Y Z *
---
output
--
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z *
---

sorted runs (lengths) on devices
--
.    0     1    2    3    4    5  
.  --------------------------------
.    3(3) 3(3) 3(3)  0    0    0
.     0    0    0   1(9) 1(9) 1(9)  27
.    1(27) 0    0    0    0    0    27
---
-----
          Polyphase Merge

Concept: cut the number of devices needed in half

MERGE-UNTIL-EMPTY
  one device empty
  different number of runs
    on other devices
  merge to empty device 
    until another device is empty
  iterate

Strategically place runs on devices
  to make it so that last merge
  makes all devices empty at once

Work backwards to figure placement

Ex: 3 devices
--
.    1    0    0    0
.    0    1    1    1
.    1    0    2    2
.    3    2    0    4
.    7    6    4    0
---
distribution for 17 runs

General pattern is complicated!
-----
          Polyphase merge example


input
--
T H E Q U I C K B R O W N F X J M P S V L A Z Y D G
---
sorted runs of size 3
--
  0     0     0     1     1     3     3     3     3
E H T I Q U B C K O R W F N X J M P L S V A Y Z D G *
---
result of phase 1
--
  0     3     3           2                 2
B C K A Y Z D G * E H J M O P R T W F I L N S Q U V X
---
result of phase 2
--
  3           2                       1
D G * F I L N S Q U V X A B C E H J K M O P R T W Y Z
---
output
--
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z *
---

sorted runs (lengths) on devices
--
.     0      1      2      3 
.  ----------------------------
.    3(3)   2(3)   0      4(3)  
.    1(3)    0    2(9)    2(3)  18
.     0     1(15) 1(9)    1(3)  15
.    1(27)   0     0       0    27
---
-----
          Polyphase merge analysis

Ex. 34 initial sorted runs, 3 devices
--
.     0    21    13
.    13     8     0
.     5     0     8
.     0     5     3
.     3     2     0
.     1     0     2
.     0     1     1
.     1     0     0
---

Fibonacci numbers
  phi = 1.618... (golden ratio)
  phi^N runs in phi phases 
  each touching 1/phi of the data

THM: Total number of passes is log_phi(N/M)/phi

--
.           PC sort      workstation
.  file    1 billion      1 trillion
. memory   1 million      1 billion
. devices     3               3
. passes      8               8
---
Using half as many devices as balanced merge,
  still completes sort in fewer passes

Precise analysis extremely complex

-----
          Virtual memory

Another idea:  use quicksort

Paging system "automatically" takes care
  of reading data in blocks

Requires
  ability to address huge files
  locality of reference in program

Partitioning on such a system:
  read a block from the left
  read a block from the right
  write small elements on left
  write small elements on right
Paging systems takes care of details

Estimate of costs
  two devices working at a time
  2 ln (N/M) passes

Can customize to partition-based sort
  distribute files a la MSD sort
  utilize multiple devices
-----
          Basic external searching

N records
stored in pages of size M

Goal:
  find item with given key
  read as few pages as possible

INDEXED SEQUENTIAL ACCESS

Store file sequentially on devices
  build index for each device
  build master index for devices

Like encyclopedia

Problem: 
  index is hard to maintain
    under insertion- deletion
  real goal: *dynamic* index
-----
          B-trees

Generalize 2-3-4 trees: up to M links per node
Split full nodes on the way down

Used for external search
  node size = page size
  typical: M=1000
    search and insert 1 trillion records in
    four or five page acceses

Key point: flexibility to do fast insert

Red-black abstraction still works
  BUT
might not use internal links
(use binary search instead)

Space--time tradeoff
  M large: only a few levels in tree
  M small: less wasted space
Normally build as separate index
Main issue is to minimize wasted space

Bottom line
  roughly log_M N page accesses
  3 or 4 in practice

-----
          B tree example


-----
          B tree example (continued)


-----
          B tree growth


-----
          Skip Lists

Generalize skip lists 
  nodes hold M records
  use sequential search within nodes

INSERT algorithm
  node not full: add record to node
  node full:
    split into two half-full nodes
    give new node t links
      with probability 1/M^t

Bottom line
  roughly log_M N page accesses
  3 or 4 in practice
  randomized algorithm OK?

-----
          Key-indexed directory

Use leading t bits of keys to find page with key
  capacity: M*2^t
To increase capacity, double directory


Problems when >M keys have same leading bits
Ex: M+1 equal keys
-----
          Extendible hashing

Generalize trie, hash search

Hash keys to make them random bits
Build two-level trie on hashed keys
  directory size 2^t
  page size M

SEARCH
  i = first t bits of hashed key
  directory entry i gives page name
  search that page for the key  

INSERT
  same as search
  put key on page accessed
  if page full
    split into two pages
  **may need to split directory**

Bottom line
  2 page accesses per search
  44 percent extra space
  randomized algorithm OK?
-----
          Extendible hashing example


-----
          Extendible hashing example (continued)


          COS 226 Lecture 12: String searching

TEXT: N characters
PATTERN: M characters
  (C occurrences of pattern in text)

Existence
  Any occurrence of pattern in text?
Enumerate
  How many occurrences of pattern in text?
Search
  Find an occurence of pattern in text
Search all
  Find all occurences of pattern in text

Three parameters N, M, C
  start with N >> M >> C
  Ex: N = 100,000, M = 100, C = 5

Other factors
  multiple patterns (preprocessing)
  binary vs. ascii text
  avoid backup in text
----------
          String searching examples

Text string: find gjkmxorzeoa in
--
  kvjlixapejrbxeenpphkhthbkwyrwamnugzhppfx
  iyjyanhapfwbghxmshwrlyujfjhrsovkvveylnbx
  nawavgizyvmfohigeabgksfnbkmffxjffqbualey
  tqrphyrbjqdjqavctgxjifqgfgydhoiwhrvwqbxg
  rixydzbpajnhopvlamhhfavoctdfytvvggikngkw
  zixgjtlxkozjlefilbrboignbzsudssvqymnapbp
  qvlubdoyxkkwhcoudvtkmikansgsutdjythzlapa
  wlvliygjkmxorzeoafeoffbfxuhkzukeftnrfmoc
  ylculksedgrdivayjpgkrtedehwhrvvbbltdkctq
---

Binary: find 101001111010111011 in
--
  1000100101011010001001101011010110110110
  0111111110101001110101000000010111101001
  1011100000011000110000100110000111101101
  1100000011011110101111101111000111100001
  1011111001110011001111011001000000011101
  1111100001011001110011010010110101001001
  0001111111111011100111101100101010011110
  1111001110110010010010011000100010011100
  0010000011100010101110101001101100100011
---

Long text and pattern
* fixed pattern, random text: return "not found"
* fixed text, random pattern: return "not found"
----------
          Brute-force string searching

Check for pattern at every text position

--
int brutesearch(char *p, char *a)
  {
    int i, j, cnt = 0;
    for (i = 0; i < strlen(a); i++)
      for (j = 0; j < strlen(p); j++)
        {
          if (a[i+j] != p[j]) break;
          if (j == strlen(p)-1) return i;
        }
    return strlen(a)+1;
  }
---

DON'T USE THIS PROGRAM! (Why?)

In C, strlen takes time proportional string length
  running time at least N^2
  even for simpler programs (count the a's)
Performance matters in ADT design!
Exercise: implement string ADT with fast strlen
  need space to store length
  need to update length when changing string
  ...

----------
          Brute-force algorithm (continued)

Easy alternative (if bug is discovered!)

--
int brutesearch(char *p, char *a)
  { int i, j;
    int M = strlen(p), N = strlen(a), cnt = 0;
    for (i = 0; i < N; i++)
      for (j = 0; j < M; j++)
        {
          if (a[i+j] != p[j]) break;
          if (j == M-1) return i;
        }
    return N+1;
  }
---

Different implementation (same algorithm)
  char match: increment both i and j
  char mismatch: set j to 0, reset i

-- 
int brutesearch(char *p, char *a)
  {
    int i, j, M = strlen(p), N = strlen(a);
    for (i = 0, j = 0; j < M && i < N; i++, j++)
      while (a[i] != p[j]) { i -= j-1; j = 0; }
    if (j == M) return i-M; else return i;
  }
---

----------
          Analysis of brute-force algorithm

Running time depends on text and pattern

Worst case: search for 000001
--
.    000000000000000000000000000001
.    00000*
.     00000*
.      00000*
.       00000*
.        00000*
---
  M*N bit compares

Average case (fixed pattern, random text)
--
.    10011010010100010100111000111
.    *
.     00*
.      0*
.       *
.        *
.         0*
.          *
.           001
---
long pattern: 2*N compares
short pattern: 
* precise cost depends on pattern
* (first 001 appears before first 000, on average)
Ref: Flajolet and Sedgewick

----------
          Rabin-Karp algorithm

Idea: Use hashing
* compute hash function for each text position
* no table! (just compare with pattern hash)

Ex: "table" size 97, M = 5
Search for 15926 = 18 (mod 97)
--
.      31415926535897932384626433
.      31415 = 84 (mod 97)
.       14159 = 94 (mod 97)
.        41592 = 76 (mod 97)
.         15926 = 18 (mod 97)
.          59265 = 95 (mod 97)
---  

Problem: hash function depends on M characters
Solution: compute hash for i+1 from hash for i
  (improves running time from N*M to N+M)
--
  31415 = 84  (mod 97)
  14159 = (31415 - 30000)*10 + 9 
        = (84 - 3*9)*10 + 9 (mod 97)
        = 579 = 94 (mod 97)
  41592 = (94 - 1*9)*10 + 2 = 76 (mod 97)
  15926 = (76 - 4*9)*10 + 6 = 18 (mod 97)
  59265 = (18 - 1*9)*10 + 5 = 95 (mod 97)
---
Problem: need full compare on collision
--
  92653 = (95 - 5*9)*10 + 3 = 18 (mod 97)
---
Solution: use giant (virtual) table!
----------
          RK algorithm implementation

Take table size as big as possible
  (not so big that overflow happens)

--
#define q 3355439
#define d 256
int rksearch(char *p, char *a)
  {
    int i, j, dM = 1, h1 = 0, h2 = 0;
    int M = strlen(p), N = strlen(a);
    for (j = 1; j < M; j++) dM = (d*dM) % q;
    for (j = 0; j < M; j++) 
      {
        h1 = (h1*d + p[j]) % q;
        h2 = (h2*d + a[j]) % q;
      }
    for (i = M; i < N; i++)
      {
        if (h1 == h2) return i-M;
        h2 = (h2 + d*q - a[i-M]*dM) % q;
        h2 = (h2*d + a[i]) % q;
      }
    return N;
  }
---

Randomized algorithm:
  take random (big) table size

----------
          Knuth-Morris-Pratt algorithm

Text characters that match are in the pattern!
Can precompute what to do on mismatch

Ex: searching for 000001
  mismatch 00000* implies
  * text had 000000
  * if next text char is 1, full match found
  * if next text char is 0, back in same state
  mismatch 000* implies
  * text had 0001
  * restart search on next text char

Finite-state automaton for 000001
--
.       0   1   2   3   4   5
.    0  1   2   3   4   5   5
.    1  0   0   0   0   0   6
---
worst-case text
--
.    000000000000000000000000000001
.    012345555555555555555555555556    
---
random text
--
.    100111010010100010100111000111
.    001200010120101230101200012300
---
----------
          KMP algorithm (continued)

Finite-state automaton for 10100110


--
.        0   1   2   3   4   5   6   7
.    0   0   2   0   4   5   0   2   8
.    1   1   1   3   1   3   6   7   1
---
Search example:
--
.    100111010010101010100110000111
.    0120111234562343434345678
---

KMP algorithm:
  build FSA (or C program) from pattern
  run FSA on text
  N+M running time
  no backup in text

----------
          KMP automaton construction

FSA builds itself (!)

Successors to state i
* match: i+1
* mismatch:
  FSA state corresponding to 
  * next putative text match
  * take mismatch into account

Ex: FSA for 10100110
State 6
  1010011: go to state 7
  1010010: go to state for 010010 (state 2)
    remember state for 010011 (X = state 1)
State 7
  10100110: go to state 8
  10100111: go to state for 0100111 (state 1)
    remember state for 0100110 (X = state 2)
main point: these are successors of old X

Algorithm:
  X: mismatch state
  match: FSA[match][i] = i+1
  mismatch:
    FSA[mismatch][i] = FSA[mismatch][X]
    X = FSA[match][X]

/lines 35 def
----------
          KMP automaton construction example


Pattern: 10100110

--
.        0   1   
.    0   0   2     1
.    1   1   1     01
---

--
.        0   1   2   
.    0   0   2   0     00 
.    1   1   1   3     000
---

--
.        0   1   2   3   
.    0   0   2   0   4     011 
.    1   1   1   3   1     0011
---

--
.        0   1   2   3   4   
.    0   0   2   0   4   5    0101   
.    1   1   1   3   0   3    00123
---

--
.        0   1   2   3   4   5   
.    0   0   2   0   4   5   0    01000   
.    1   1   1   3   0   0   6    001200  
---

--
.        0   1   2   3   4   5   6   
.    0   0   2   0   4   5   0   2    010010   
.    1   1   1   3   0   0   6   7    0012012
---

--
.        0   1   2   3   4   5   6   7
.    0   0   2   0   4   5   0   2   8    0100111
.    1   1   1   3   0   0   6   7   1    00120111
---
----------
          KMP implementation

Build FSA from pattern, run FSA on text string
Improvement:
  match state in FSA unnecessary (always i+1)
  use mismatch state only ("next" array)
  small hack: next[0] = -1
--
int kmpsearch(char *p, char *a)
{
  int i, j, M = strlen(p), N = strlen(a);
  initnext(p);
  for (i = 0, j = 0; j < M && i < N; i++, j++)
    while ((j>=0) && (a[i]!=p[j])) j = next[j];
  if (j == M) return i-M; else return i;
}
---
Ultimate search program for pattern:
  machine language version of FSA
--
int kmpsearch(char *a)
{ int i = 0;
s0: if (a[i] != '1') goto s0; i++;
s1: if (a[i] != '0') goto s1; i++;
s2: if (a[i] != '1') goto s0; i++;
s3: if (a[i] != '0') goto s1; i++;
s4: if (a[i] != '0') goto s3; i++;
s5: if (a[i] != '1') goto s0; i++;
s6: if (a[i] != '1') goto s2; i++;
s7: if (a[i] != '0') goto s1; i++;
    return i-8; }
---
----------
          Right-left pattern scan

Can get sublinear algorithms by moving
  right to left in pattern
  (left to right in text)

Ex:  Find string of 9 consecutive 0s
--
.    100111010010100010100111000111
.            0
.           1 
.                    1
.                             0
.                            0
.                           1
---

Same idea effective in large alphabet
  for every character in alphabet
  compute skip when mismatch on that char
  (skip at least to next text char!)

Search time proportional to N/M
  for practical problems
Time decreases as pattern length increases (!)

Yet another parameter: alphabet size
  binary: 2 (group bits for this alg)
  text string: 256 (probably less)

----------
          Right-left pattern scan examples

Text character not in pattern
  skip forward M chars
--
. now is the time for all good people to come
.       *      *      *      *      e
.                                  l
.                                 p
.                                o
.                               e
.                              p
---

Text character in pattern
  skip forward to putative pattern end
--
. you can fool some of the people some of the time
.       *      *      *      o
.                               e
.                              l
.                             p
.                            o
.                           e
.                          p
---

----------
          Right-left pattern scan implementation

--
initskip(char *p)
  { int j, M = strlen(p);
    for (j = 0; j < 256; j++) skip[j] = M;
    for (j = 0; j < M; j++) skip[p[j]] = M-j-1;
  }

#define max(A, B) (A > B) ? A : B;
int mischarsearch(char *p, char *a)
  { int i, j;
    int M = strlen(p), N = strlen(a);
    initskip(p);
    for (i = M-1, j = M-1; j >= 0; i--, j--)
      while (a[i] != p[j])
        {
          i += max(M-j, skip[a[i]]);
          if (i >= N) return N;
          j = M-1;
        }
    return i+1;  
  }
---

Boyer-Moore algorithm
  mismatched character heuristic
  KMP-like computation for skip
  (pick the largest)
----------
          Multiple patterns and text

Related to symbol table ADT

DICTIONARY
  build trie from text (preprocess)
  pattern lookups in O(lgN) steps

EXCEPTION DICTIONARY
  build trie from set of patterns (preprocess)
  find patterns in given text in N steps
  generalization of KMP
Ex: FSA for 000, 011, and 1010

--
.        0   1   2   3   4   5   6
.    0   1   3   5   7   5   3   9
.    1   2   4   2   4   8   6   8
---
Simultaneous search for all patterns in text
--
.    111100100100101110100000
.    02222534534534568
---

          COS 226 Lecture 13: Pattern matching

Generalize string-searching problem to include
  incompletely specified patterns

TEXT: N characters
PATTERN: M-character REGULAR EXPRESSION
  (C occurrences of pattern in text)

Ex:
  Is some substring of
     cdbcaaaaabcddbbc 
  in the language described by the RE
     (a*b + ac)d  ?

As with string matching,
  extend to count matches, return all matches

egrep

Practical application of core principles
  of the theory of computation
Based on simulating
  NONDETERMINISTIC finite-state automata
Essential concept in computer science
  build intermediate abstractions
  pick the right ones!

-----
          "man grep" excerpts

--
Name
  grep - search file for regular expression
Syntax
  grep [option...] expression [file...]
Description
  Commands of the grep family search the 
  input files (standard input default) for 
  lines matching a pattern.  Normally, each 
  line found is copied to the standard output.

  Take care when using the characters 
  $ * [ ^ | ( ) and \ in the expression because
  they are also meaningful to the Shell. It 
  is safest to enclose the entire expression 
  argument in single quotes ' '.
---

Basic elements of expressions (grep)
  c       any non-special char matches itself
  r*      zero or more occurences of r

"Extended" regular expressions (egrep)
  (r)
  r1 | r2

-----
          Regular expression


Theoretician: 
  language accepted by FSA
Programmer: 
  compact description of multiple patterns


Concatenate
--
  abcda       abcda
---

OR
--
  a+b         a b
  c(a+b)d     cad cbd
  (ac+b)d     acd bd
  (a+c)(b+d)  ab ad cb cd
---

Closure
--
  a*          . a aa aaa aaaa aaaaa
  ca*b        cb cab caab caaab caaaab
  (a+b)*      . a b aa ab ba bb aaa aab aba ...
  c(a+b)*d    cd cad cbd caad cabd cbad cbbd ...
---

/lines 35 def
-----
          String searching FSAs

Recall:
  KMP algorithm: FSA for one pattern
  AC algorithm: FSA for multiple patterns
  purpose of FSA is to avoid backing up in text
RE searching is much harder
  check each text position for RE match
  N(N-1)/2 possible matches
Ex: Is some prefix of
--
.     cdbcaaaaabcddbbc 
.      dbcaaaaabcddbbc 
.       bcaaaaabcddbbc 
.        caaaaabcddbbc 
.          ...
---
in the language described by the RE
--
     (a*b + ac)d  ?
---

THM: There exists an FSA for every RE

Ex: FSA for acb (string match)
--
.       0   1   2   3
.   a   1   1   1   3
.   b   0   0   3   3
.   c   0   2   0   3
---

-----
          C program to simulate FSAs

Ex: FSA for c(a+b)d
--
.       0   1   2   3   4
.   a   0   3   0   0   4
.   b   0   2   0   0   4
.   c   1   1   1   1   4
.   d   0   0   3   3   4
---

-- 
    struct state { char c; int next[256]; }
    state fsa[1000];

    int t = start;
    while ((c = getchar()) != EOF)
      t = fsa[t].next[c];
    if (t == final) printf("Accepted\n"); 
    else printf("Rejected\n");
---

Different type of FSA can avoid large next table
  character in state
  one state on match
  another state on mismatch

Possible approach to implementing grep:
  build FSA from RE
  use C program to simulate FSA

Problems:
  FSA can be exponentially large
  how to build FSA from RE?
-----
          Nondeterministic FSAs

Appropriate intermediate abstraction for egrep
  small in size
  can build directly from RE

FSA that can guess the right answer
More than one successor state

Simplest version
  character in state
    next state on match
  NULL states
    two possible successor states

Ex: NDFSA for ca*b
--
.           0   1   2   3   4
.     ch    c       a   b
.   next1   1   2   1   4   4
.   next2   1   3   1   4   4
---

Ex: NDFSA for c(a+b)d
--
.           0   1   2   3   4   5
.     ch    c       a   b   d
.   next1   1   2   4   4   5   5
.   next2   1   3   4   4   5   5
---

-----
          NDFSAs and FSAs

Nondeterminism does not extend
  descriptive power of FSAs

THM: Given an nondeterministic FSA, 
  there is a deterministic FSA that recognizes
  the same language.

Possible approach to implement grep:
  build NDFSA from RE
  transform to FSA
  use C program to simulate FSA

Standard construction for NDFSA to FSA
  make one FSA state for 
    each possible subset of NDFSA states

Problem:
  FSA may be exponential in size

Easier approach:
  use C program to simulate NDFSA

-----
          Building a NDFSA from a regular expression

Each RE construct
  corresponds to the construction
  of a piece of the NDFSA 

Single character:

Concatenation:

Closure (*):

OR:

-----
          NDFSA construction example


A


A*


A*B


A*B+AC


(A*B+AC)D


-----
          NDFSA representation


NDFSA for (A*B+AC)D


--
.          0   1   2   3   4   5   6   7   8   9
.    ch        a       b           a   c   d
.  next1   5   2   3   4   8   6   7   8   9   0
.  next2   5   2   1   4   8   2   7   8   9   0
---

-----
          Recursive descent RE parser

Goal: program to build NDFSA from RE
Solve easier problem first:  is RE legal?

Use context free language to describe RE

--
  <expr> := <term> | <term> + <expr>
  <term> := <fctr> | <fctr><term>
  <fctr> := ( <expr> ) | c | ( <expr> )* | c*
---

PARSER: determine structure of legal strings
COMPILER: add semantic information to translate

Top-down recursive descent parser:
  recursive programs directly derived from CFL

Expression
--
  <expr> := <term> | <term> + <expr>
---
--
expression()
  {  
    term();
    if (p[j] == '+') 
      { j++;  expression(); }
  }
---
/lines 35 def
-----
          Recursive descent RE parser (continued)

Term
--
  <term> := <fctr> | <fctr><term>
---
--
term()
  { factor();
    if ((p[j] == '[') || letter(p[j])) term(); }
  }
---

Factor:
--
  <fctr> := ( <expr> ) | c | ( <expr> )* | c*
---
--
factor()
  {
    if (p[j] == '[')
      { j++; expression();
        if (p[j] == ']') j++; else error(); }
    else if (letter(p[j])) j++; else error();
    if (p[j] == '*') j++;
  }
---

Doesn't always work;
--
  <expr> := v | <expr> + <term>
---
--
badexpression();
  {
    if (letter(p[j])) j++; else
      {
        badexpression();
        if (p[j] == '+') { j++; term(); }
        else error();
      }
  }
---
leads to infinite recursive loop!
-----
          Parsing example

RE: (A*B+AC)D

--
expression()
  term()
    factor()
      [
      expression()
        term()
          factor()  A  *
          term()
            factor()  B
        +
        expression()
          term()
            factor()  A
            term()
              factor()  C
      ]
    term()
      factor()  D
---

-----
          Recursive compiler to build FSA from RE

Augment parser to put out state table for FSA
  state: next state to be filled in
  setstate: procedure to fill in state values
Recursive routines return index of initial state

--
int expression()
  { int t1 = term(), t2, r = t1;
    if (p[j] == '+') 
      {
        j++; state++;
        t2 = state; r = t2; state++;
        setstate(t2, ' ', expression(), t1);
        setstate(t2-1, ' ', state, state);
      }
    return r;
  }
---
--
next1[0] = expression();
setstate(0, ' ', next1[0], next1[0]);
setstate(state, ' ', 0, 0);
---

NDFSA for (a*b+ac)
--
.          0   1   2   3   4   5   6   7   8
.    ch        a       b           a   c  
.  next1   5   2   3   4   8   6   7   8   0 
.  next2   5   2   1   4   8   2   7   8   0  
---
-----
          Simulating NDFSAs

Maintain a DEQUE (doubly ended queue) with
  * possible states for CURRENT char
  * a scan marker, *
  * possible states for NEXT char
--
        3  6  1  *  4  7  2
---
Take next state off beginning (stack, queue)
* null: put both states at beginning (stack)
* match: put new state at end (queue)

--
int match(char *a)
{ int j = 0, N = strlen(a), state = next1[0];
  dequeinit(); put(scan);
  while (state)
    {
      if (state == scan) { j++; put(scan); }
      else if (ch[state] == a[j]) 
        put(next1[state]);
      else if (ch[state] == ' ')
      { push(next1[state]); push(next2[state]); }
      if (dequeempty() || j==N) return 0; 
      state = pop();
    }
  return j;
}
---
Don't bother checking next1==next2.  Stay tuned.

-----
          NDFSA simulation example


Simulation of NDFSA for pattern (a*b+ac)d
  running on text string aabd

--
.         5   a
.     2   6   a
. 1   3   6   a
. 1   3   6   a
.     3   6   a   2
.         6   a   2
.             a   2   7
.                 2   7   a
.             1   3   7   a
.                 3   7   a   2
.                     7   a   2 
.                         a   2 
.                             2   b
.                         1   3   b
.                             3   b
.                                 b   4
.                                     4   d
.                                     8   d
.                                         d   9
.                                             9   *
.                                             0   *
---
-----
          Problem: exponential growth

NDFSA for (a*a)*b
--
.          0   1   2   3   4   5   6
.    ch        a       a       b   
.  next1   4   2   3   4   2   6   0
.  next2   4   2   1   4   5   6   0
---
Simulation of machine running on aaaaaaab
--
.         4   a
.     5   2   a
.         2   a
.     1   3   a
.         3   a   2
.             a   2   4
.                 2   4   a
.             1   3   4   a
.                 3   4   a   2
.                     4   a   2   4
.                        ...
                          a   2   4   2   4
                              2   4   2   4   a
...
.     a   2   4   2   4   2   4   2   4
---

Deque can get exponentially large
Easy fix: no duplicate states on deque
  keep "existence table", indexed by state
  also fixes next1=next2 code hack

-----
          Worst-case analysis of grep match

N: length of text
M: length of RE

Number of states in NDFSA:
  single character:  2
  concatenation AB:  |A| + |B| - 1
  closure A*:  |A| + 1
  OR A+B: |A| + |B| + 1
Total: less than or equal to M+1

Time for match:
  not more than M+1 states for each char
  run alg on each text position
  total not more than
    (M+1)(N + N-1 + N-2 + ...) = O(MN^2)
  (linear on real problems)

Quintessential tool, classic algorithm

-----
          Context

string searching
  brute force implementation O(MN)
  KMP FSA saves a factor of M

grep implementation
  abstract machine: NDFSA
  pattern (word in CFL) --> NDFSA
  simulate NDFSA 

compiler
  abstract machine: computer
  program (word in CFL) --> machine code
  run program (hardware simulation of abstract machine)

grep is simple compiler
  converts CFL program to machine code

parser:
  check if pattern is in CFL
grep:
  check if text is in RE
Same function at different levels of hierarchy

Faster algorithm than grep?
  Yes, using tries 
  (ref. Gonnet and Baeza-Yates)

          COS 226 Lecture 14: File compression

Compression: reduce the size of a file
  to save TIME when transmitting the file
  to save SPACE when storing it away

Widely used
  despite technological advances 
  (Why?)

Basic concepts ancient (1952)
Best technology recently developed 

Simple methods
Huffman codes
Information theory
Arithmetic codes
Adaptive (LZ) codes

-----
          Simple methods

Step 1: avoid doing something dumb 
Step 2: try to do something clever

Character count encoding
  Ex: ASCII binary file
  Ex: ACGT  molecular biology sequences
  Ex: five-bit code for text

Run length encoding
  Ex: blanks in text files
  Ex: black-and-white graphics
How to represent counts?
  escape characters
  separate alphabet (may waste bits)
  different language entirely

General representation problem
  annoying practical problems
  usually work with either
  * bitstream/bytestream
  * separate language entirely

Special hacks
  Ex: encode /usr/dict/words

-----
          Run-length encoding

Remove blanks in files
  ex: tabular data without tabs
  ex: (ancient) card images
Record lengths of 0s and 1s in binary files
  ex: black-and-white graphics
--
.000000000000011111111111111000000000 13 14  9
.000000000001111111111111111110000000 11 18  7
.000000001111111111111111111111110000  8 24  4
.000000011111111111111111111111111000  7 26  3
.000001111111111111111111111111111110  5 30  1
.000011111110000000000000000001111111  4  7 18 7
.000011111000000000000000000000011111  4  5 22 5
.000011100000000000000000000000000111  4  3 26 3
.000011100000000000000000000000000111  4  3 26 3
.000011100000000000000000000000000111  4  3 26 3
.000011100000000000000000000000000111  4  3 26 3
.000001111000000000000000000000001110  5  4 23 3 1
.000000011100000000000000000000111000  7  3 20 3 3
.011111111111111111111111111111111111  1 35
.011111111111111111111111111111111111  1 35
.011111111111111111111111111111111111  1 35
.011111111111111111111111111111111111  1 35
.011111111111111111111111111111111111  1 35
.011000000000000000000000000000000011  1  2 31 2
---
Compression factor increases with resolution
No help for random files!
-----
          Huffman code

Variable length code, defined by trie

Ex: abracadabra
  88 bits in byte code
  55 bits in 5-bit code
  33 bits in 3-bit code (only 5 different letters)

Huffman code:
--
                     a 1      5  5
                     b 001    2  6
                     c 0000   1  4
                     d 0001   1  4
                     r 01     2  4     
                                23 bits
  1*001*01*0*0000*0*0001*0*001*01*0
  10010110000100011001011
---
not necessarily unique
--
                     a 0      5  5
                     b 110    2  6
                     c 100    1  3
                     d 101    1  3
                     r 111    2  6     
                                23 bits
  01101110100010101101110
---        
Depends on encoding frequently used chars
  with fewer bits than infrequently used chars  

-----
          Huffman code implementation

ENCODE:
  count frequency of occurrence of characters
  build trie by combining two smallest
    frequencies
  trie defines code
  pass through source, output code

DECODE:
  use bits to guide trie search
  output char when reached, restart at top

Have to transmit trie!
  suffices to transmit code
  relatively insignificant for big messages

Issues:
  trie representation
  data structure to hold nodes being processed

-----
          Huffman code construction example
-----
          Huffman code construction example (cont.)
-----
          Huffman code implementation (continued)

Trie representation
  parent links
  right-left bit

Data structure for nodes being processed
  "indirect" priority queue

To build code:
  put all chars with nonzero counts in PQ
  while PQ has two or more nodes
    take two smallest off PQ
    make new node with sum of counts
    put on PQ
  
Another algorithm to build code:
  sort initial frequencies
  merge with generated frequencies
  no PQ needed
  
THM (Huffman, 1952): Huffman code is optimal
  (no variable-length code uses fewer bits)
Problems:
  have to transmit code
  expensive computationally
  two-pass
  not optimal (!)
-----

a simple string to be encoded using a
minimal number of bits

--
  c f p d g l r u a b o t m s e n i *
  1 1 1 2 2 2 2 2 3 3 3 3 4 4 5 5 6 11

  1 1 1 2 2 2 2 2 3 3 3 3 4 4 5 5 6 11
  1 2 2 2 2 2 3 3 3 3 4 4 5 5 6 11 * 2
  2 2 2 2 3 3 3 3 4 4 5 5 6 11 * 2 3
  2 2 3 3 3 3 4 4 5 5 6 11 * 2 3 4
  3 3 3 3 4 4 5 5 6 11 * 2 3 4 4

  3 3 3 4 4 5 5 6 11 * 3 4 4 5
  3 4 4 5 5 6 11 * 3 4 4 5 6
  4 4 5 5 6 11 * 4 4 5 6 6
  5 5 6 11 * 4 4 5 6 6 8
  5 5 6 11 * 5 6 6 8 8
  6 11 * 5 6 6 8 8 10
  11 * 6 6 8 8 10 11
  11 * 8 8 10 11 12
  11 * 10 11 12 16
  * 11 12 16 21

  16 21 23
  37 23
  60
---

-----
          Huffman trie representation

Links point up the tree
  one index per node to parent
  one bit per node (left or right child)


--
.28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
.27 27 28 28 29 29 30 31 31 32 32 34 35 36 38 39
. *     *     *        *        *  *  *     *
---

To get code for given char
  * move up the trie
  * collect left-right bits
  * reverse string when top reached

-----
          Entropy

Basic concept of information theory
  formal model of source
  formal concept of "information content"

First order model: chars generated randomly
  with fixed probability (independently)

THM: Huffman is optimal in first order model

Entropy concept confirms intuition
* random data cannot be compressed
* data compressed by one optimal compressor
  won't be compress further by another
* no guarantee of specific performance on all data
BUT
* "average" case results
* depends on mathematical model
* real data may not fit any model

Example:
  short program generates large data file
  compression alg has to discover the program!

Statistical modeling
  find short program (or model parameters)
  ex: speech recognition

-----
          A difficult file to compress

One million random characters
--
ylculksedgrdivayjpgkredehwhrvvbbltdkctqtoctnnhylmsgvwachzjvwhzabzrhsyqxnzgkpufx
dytmoajjsorphchnxlygnezneofgidqfyqfmvldxreuseufgryyvuzsrdreilknwvordwsfblvhinaw
ezslumidtljqydtbprazipawngrubtqyjfiowyaktrwuopoohtkuaenzxcuivbjekdimqdqtxxcwkau
ndcobuekavqepoxmidcmdrahhuzdjmjxkeoyimflsoyxknqkqsgwgjofhakqrhmbfhgvtgsousmvcuw
uungwqzzhllznusllaepdphwojuiorbpsuxxebnszlktzbapzpscprlltgiklkgczvxmrgoywywjdga
pspqslilkpktzjarozmfvaqjmnlqpesbvdgpkkxrxeaphrlfabegwprtqnywwxhoysbionurepogfuc
qfnehrquwdyiuurcrpylxzzkaystirtemsgaplcpcfnwmgfokcfyigsoudbsemxcenhofagogdidfqn
... 
---
A lower bound for compression: 130 chars!
--
#include "math.h"

main ()
  { int i, j; char x;
    for (i = 0; i < 12500; i++)
      {
        for (j = 0; j < 80; j++) 
          { x = 'a' + 26*drand48();
            printf("%c", x);
          }
      }
  }
---

Ex:
  Postscript file??
  
-----
          Arithmetic Coding

Represent source with a real number!
  Ex: abracadabra
--
    a  0.00000   .45454    5/11 chars are a's
    b   .45454   .63636    2/11 chars are b's
    c   .63636   .72727
    d   .72727   .81818
    r   .81818  1.00000
---

ENCODE: rescale interval for each letter
Ex: code for "abracadabra" is .2787887
--
  a .00000000 .45454547   (0, 4/11)  in (0, 1)
  b .20661159 .28925622 (5/11, 7/11) in (0, 4/11)
  r .27422991 .28925622
  a .27422991 .28106004
  c .27857634 .27919728
  a .27857634 .27885857
  d .27878159 .27880725
  a .27878159 .27879325
  b .27878690 .27878901
  r .27878863 .27878901
  a .27878863 .27878881
---

Can rescale and output code digit when
  leading digits of intervals match
  (have to do so to handle long messages)
-----
          Arithmetic Coding (decode)

Given code (real number between 0 and 1)
  it falls in one of the intervals (l, r)
    associated with the characters
  output that character
  rescale to (0,1)
--
    code = (code-l)/(r-l);
---
  repeat to get next char

Ex:
--
  .27878872 a since .27... is in (0, 5/11)
  .61333513 b (.27...- 0)/(5/11); in (5/11, 7/11)
  .87334329 r (.61...- 5/11)/(2/11); in (9/11, 1)
  .30338812 a 
  .66745383 c 
  .34199208 a 
  .75238258 d 
  .27620819 a 
  .60765803 b 
  .84211922 r 
  .13165572 a 
---
Need to adjust algorithm to match rescaling
  used on decode (can be tricky to do)

Bottom line:
  slightly better than Huffman
  like having fractional bits
-----
          Pure Adaptive Coding (LZ)

To encode the next character(s)
  * if no match of length >1 in previous
    string, output the next char
  * otherwise, output 
      pointer to longest previous match
      length of longest previous match
      char causing the mismatch

--
.  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
.  t h i s   i s   t h e   t h i r d   t i m e
.  
.  t h i s * 2 3 t h e 4 3 i r d 6 2 i m e
.  
.  
.  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 
.  0 1 1 0 0 1 1 0 0 0 0 1 1 0 1 1 1 1 0 1
.  
.  0 1 1 0 4 4 0 6 5 1 4 2 1 5 3
---

Problems:
  big dictionary (N^2 prev substrings)
  online dictionary update not easy
-----
          Dictionary-based Adaptive Coding (LZW)

--
.  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
.  t h i s   i s   t h e   t h i r d   t i m e
.  
.   0 t       t
.   1 h       h
.   2 i       i
.   3 s       s
.   4 *       *
.   5 is    2 s
.   6 *t    4 t
.   7 he    1 e
.   8 *th   6 h
.   9 ir    2 r
.  10 d       d
.  11 *ti   6 i
.  12 m       m
.  13 e       e
.  
.  t h i s * 2 s 4 t 1 e 6 h 2 r d 6 i m e
---

-----
          Binary example of LZW


--
.  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9
.  0 1 1 0 0 1 1 0 0 0 0 1 1 0 1 1 1 1 0 1
.  
.  0  0 1 0   
.  1  1 1 1
.  2  2 2 10    1 0
.  3  4 2 01    0 1
.  4  2 3 100   2 0
.  5  9 2 00    0 0
.  6 11 2 11    1 1
.  7 13 3 011   3 1
.  8 16 3 110   6 1
.  
.  1 0 0 1 2 0 0 0 1 1 3 1 6 1
---

-----
          LZW implementation

Use trie for dictionary (standard representation)


--
.0001111001110110110110001001101111000101111010
---

ENCODE loop
  lookup string suffix in trie
  output dictionary index at bottom
  add new node to trie

DECODE options
  build trie (faster, takes space)
  build dictionary (slower)

-----
          Empirical results

Methods are available as UNIX tools
  pack: Huffman
  compress: LZ
  gzip: LZW 

Experiment:  Compress the text of "Moby Dick"
--
.  original  1220K
.     5-bit   778K  63%
.   Huffman   692K  56%
.        LZ   523K  42%
.       LZW   493K  40%
---
  gzip the results          pack the results
--
.   Huffman   692K  624K 10%    692K  687K 1%
.        LZ   523K  520K        523K  "no savings"
.       LZW   493K  493K        493K  "no savings"
---

Experiment:  Compress a 1MB file of random chars
-- 
.  original  1012K
.     5-bit   638K  63%
.   Huffman   610K  60%
.        LZ   695K  68%
.       LZW   646K  63%
---

-----
          Summary

Don't ignore the obvious

Huffman:
  represent fixed-length bitstrings
  with variable-length codes

Lempel-Ziv:
  represent variable-length bitstrings
  with fixed-length codes

Not covered: codes like JPEG
  "lossy" codes for graphics
  (don't really need all the bits)

Other questions
  compute on compressed data?
  archiving 
    compression code must be truly portable
    Ex: replace processor
    Ex: send files to different processor
  massive compression needed on some data
          COS 226 Lecture 15: Geometric algorithms

Important applications involve geometry
  * models of physical world
  * computer graphics
  * mathematical models

Ancient mathematical foundations
Most geometric algorithms less than 20 years old

Knowledge of fundamental algorithms is critical
  * use them directly
  * use the same design strategies
  * know how to compare and evaluate algs

-----
          Warning: intuition may mislead

Humans have spatial intuition in 2D and 3D
  computers do not!
  neither have good intuition in high dimensions

Ex: Is a polygon convex?

we think of this   alg sees this   or even this

Ex: Find intersections among set of rectangles

  we think of this       algorithm sees this

-----
          Geometric algorithms: overview

New primitives
  points, lines, planes; polygons, circles
Primitive operations
  distance, angles
  "compare" point to line
  do two line segments intersect?

Problems extend to higher dimensions
  (algorithms sometimes do, sometiems don't)
Higher level intrinsic structures arise

Basic problems
  intersection
  proximity
  point location
  range search
Standard approaches
  incremental (brute-force)
  divide-and-conquer
  sweep line algs
  multidimensional tree structures
  randomized algs
  discretized algorithms
  online and dynamic algs
-----
          Algorithm design paradigms

Draw from knowledge about fundamental algs
Move up one level of abstraction
  use fundamental algs and data structures
  know their performance characterisitics

More complicated primitives lead
  to wider range of problems
Some problems too complex
  to admit simple algorithms
 
For many important problems
  classical approaches give good algorithms
  need research to find "best" algorithms
  no excuse for using "dumb" algorithms

Progression of algorithm design
--
  all possibilities    double recursion     2^N
  brute force          nested for loops     N^2
  divide-and-conquer   recursion, trees   N log N
  elegant idea         1 for loop            N
  randomization        random choices        N
---
Oversimplified scenario, but fits many problems

Many examples in geometric algorithms

-----
          Geometric primitives

point
  two numbers (x, y)
line
  two numbers a and b [ax + by = 1]
line segment
  four numbers (x1, y1) (x2, y2)
polygon
  sequence of points

No shortage of other geometric shapes
  triangle, square, circle

3D and higher dimensions more complex

Primitive operations can be challenging
  is polygon simple?
  is point on line?
  is point inside polygon?
  do two line segments intersect?
  do two polygons intersect?

Algorithms search through SETS of primitives
  all points in specified range
  closest pair in set of points
  intersecting pairs in set of line segments
  overlapping areas in set of polygons

-----
          Line segment intersection

Do two line segments intersect?

To implement INTERSECT(l1, l2)
  use simpler primitive SAME(p1, p2, l):
    Given two points p1, p2 and a line l,
    are p1 and p2 on the same side of l?


    yes      no       no       no

To implement SAME
  use simpler primitive CCW(p1, p2, p3):
    Given three points p1, p2, p3,
    is the route p1-p2-p3 a ccw turn?


    yes     no

two ccw tests to implement SAME
four ccw tests to implement INTERSECT

-----
          CCW implementation

Compare dx, dy

--
int ccw(struct point p0, struct point p1, struct point p2)
  {
    int dx1, dx2, dy1, dy2;
    dx1 = p1.x - p0.x; dy1 = p1.y - p0.y;
    dx2 = p2.x - p0.x; dy2 = p2.y - p0.y;
    if (dx1*dy2 > dy1*dx2) return 1;
    if (dx1*dy2 < dy1*dx2) return -1;
    return 0;
  }
---

Still not quite right! Bug in degenerate case
  points on the same line
  Does AB intersect CD?
    on the line in the order ABCD: NO
    on the line in the order ACDB: YES

Can't just return 0 if dx1*dy2 = dx2*dy1 (see book)

CCW is a useful basic primitive
Ex: is point inside convex N-gon?  N CCW tests   

Lesson: 
  geometric primitives are tricky to implement
  can't ignore degenerate cases
-----
          Convex hull of a point set

Basic property of a set of points

CONVEX HULL:
* smallest convex polygon enclosing the points
* shortest path surrounding the points
* intersection of halfplanes defined by point pairs
--
---


Running time of algorithm can depend on
  N: number of points
  M: number of points on the hull
  point distribution


-----
          Package-wrap algorithm

Operates like selection sort

Abstract idea
  sweep line anchored at current point CCW
  first point hit is on hull
Implementation
  compute angle to all points
  pick smallest angle larger than current one

--
int wrap(struct point p[], int N)
  { int i, min, M; float th, v; struct point t;
    for (min = 0, i = 1; i < N; i++)
      if (p[i].y < p[min].y) min = i;
    p[N] = p[min]; th = 0.0;
    for (M = 0; M < N; M++)
      {
        t = p[M]; p[M] = p[min]; p[min] = t;
        min = N; v = th; th = 360.0;
        for (i = M+1; i <= N; i++)
          if (theta(p[M], p[i]) > v)
            if (theta(p[M], p[i]) < th)
            { min = i; th = theta(p[M], p[min]);}
        if (min == N) return M;
      }
  }
---
Use pseudo-angle to save time

-----
          Package-wrap example
-----
          Graham Scan

Sort points on angle with bottom point as origin
  forms simple closed polygon

Proceed through polygon
  discard points that would cause a CW turn

--
int grahamscan(struct point p[], int N)
  { int i, min, M; struct point t;
    for (min = 1, i = 2; i <= N; i++)
      if (p[i].y < p[min].y) min = i;
    for (i = 1; i <= N; i++)
      if (p[i].y == p[min].y)
        if (p[i].x > p[min].x) min = i;
    t = p[1]; p[1] = p[min]; p[min] = t;
    shellsort(p, N);
    p[0] = p[N];
    for (M = 3, i = 4; i <= N; i++)
      {
        while (ccw(p[M],p[M-1],p[i]) >= 0) M--;
        M++; t = p[M]; p[M] = p[i]; p[i] = t;
      }
    return M;
  }
---

Running time:  N log N for sort; N for scan
-----
          Graham scan example
-----
          Divide-and-conquer convex hull algorithms


divide points


divide space


switch dimensions to avoid strange degeneracies


Relatively complex to code in both cases
-----
          Incremental convex hull algorithm

Consider next point
  if inside hull of previous points, ignore
  if outside, update hull


Two subproblems to solve
  test if point inside or outside polygon
  update hull for outside points
    
Both subproblems
  can be solved by looking at all hull points
  can be improved with binary search

Randomized algorithm
  consider points in random order
  N + M log M

-----
          "Sweep line" convex hull algorithm

Sort points on x-coordinate first


Eliminates inside test

Total time proportional to N log N (for sort)
-----
          Quick elimination

Improve the performance of any convex hull
  algorithm by quickly eliminating most
  points (known not to be on the hull)
Use points at "corners"
  max, min x+y, x-y

Check if point inside quadrilateral
  four CCW tests
Check if point inside rectangle
  four comparisons

Almost all points eliminated if points random
  number of points left proportional to N^(1/2)

Total time proportional to N
-----
          Summary of 2D convex hull algs


Package wrap    
  NM
Graham scan
  N log N (sort time)
Divide-and-conquer
  N log N (with work)
Quick elimination
  N (fast average-case)
Brute-force
  N log M
Sweep line
  N log N (sort time)

How many points on the hull?
Worst case: N
Average case: depends on distribution
* uniform in a convex polygon: log N
* uniform in a circle: N^(1/3)
  
Average-case analysis requires understanding
  of basic properties of DATA, not just algs
-----
          Higher dimensions

Multifaceted convex polytope encloses points

NOT a simple object

Ex: N points d dimensions 
  d=2: convex hull
  d=3: Euler's formula (v - e + f = 2)
  d>3: exponential number of facets at worst
  
EXTREME POINTS
  return points on the hull, not necc in order

Package-wrap
Divide-and-conquer
Randomized
Interior elimination

-----
          Geometric models of mathematical problems

Impact of geometric algorithms extends far
  beyond models of physical objects

Two important examples:

Geometric problem
  find point where two lines intersect in 2D
  find point where three planes intersect in 3D
  ...
Mathematical equivalent
  solve simultaneous equations
  algorithm: gaussian elimination
 
Geometric problem
  find convex polytope defined 
    by intersecting half-planes
  find vertex hit by line 
    of given slope moving in from infinity
Mathmatical equivalent
  linear programming
  algorithm: simplex

(stay tuned)
-----
          Linear programming example

Maximize a+b subject to the constraints
  b - a  < 5
  a + 4b < 45
  2a + b < 27
  3a - 4b < 24
  a  > 0
  b  > 0
   

          COS 226 Lecture 16: Geometric search

Extend search ADT to geometric data

Range search
Intersections among geometric objects
Near neighbor search
Point location

Multidimensional problems

Trees
Divide-and-conquer
Discretized algorithms

-----
          Range searching

Possible addition to symbol-table ADT:

--
.       void STinit();
.       void STinsert(Item x);
.       Item STsearch(Key v);
.        int STempty();
.        int STrange(Key v1, Key v2);
---

Count the number of records with key values
  between v1 and v2

Options to actually process the records
* pass a procedure to be called 
  for each record in the range
* return list of records
  (possibly sorted)   
Depends on how many records expected
  (count them first)

Implementations
  ARRAY: do binary search on both v1 and v2
    return difference between indices
  HASH TABLE: no easy algorithm
  BST, TRIE: recursive traversal works

-----
          BST 1D range searching

Recursive traversal
* eliminate subtrees with no keys in interval
* root may or may not be in interval
* traverse both subtrees if it is

--
Key v1, v2; int count = 0;
int BSTrangeR(link h)
  { int tx1 = (h->key >= v1); 
    int tx2 = (h->key <= v2);
    if (tx1 && (h->l != z)) BSTrangeR(h->l);
    if (tx1 && tx2) count++;
    if (tx2 && (h->r != z)) BSTrangeR(h->r);
  }
---  

-----
          2D Range searching

Same basic method works in higher dimensions (!!)
  (discovered by an undergraduate)

interval in 1D is rectangle in 2D

2D TREE: alternate between x and y for the key

Recursive traversal
* eliminate subtrees with no keys in rectangle
* root may or may not be in rectangle
* traverse both subtrees if it is

Corresponds to recursive subdivision of the plane
  alternating horizontal and vertical lines

Trivial generalization to k dimensions

-----
          2D tree construction

Each node corresponds to an area
  in the plane
Each internal node divides its area
  into two subdivisions
Switch between horizontal and vertical
  dividing lines


Quad tree: use 4-way tree
  divide on both coordinates at once
-----
          2D tree range searching

--
int x1, y1, x2, y2, count = 0;
TDTrangeR(link h, int d)
 {
  int t1,t2,tx1,tx2,ty1,ty2;
  if (h == z) return;
  tx1 = x1 < h->p.x; tx2 = h->p.x <= x2;
  ty1 = y1 < h->p.y; ty2 = h->p.y <= y2;
  t1 = d ? tx1 : ty1; t2 = d ? tx2 : ty2;
  if (t1 && (h->l != z)) TDTrangeR(h->l, !d);
  if (tx1 && tx2 && ty1 && ty2) count++;
  if (t2 && (h->r != z)) TDTrangeR(h->r, !d);
 }
---


-----
          Manhattan line intersection problem

N lines, either horizontal or vertical

How many pairs intersect?

Call a given procedure for each intersection


As with other search problems
  usually no harder to REPORT all intersections
  (call a given procedure for each)

-----
          Manhattan line intersection

Dynamic SWEEP LINE algorithm

Horizontal line sweeps from bottom to top

At any given instant
  * vertical data line represents "point"
  * horizontal data line represents "interval"


There is an h-v intersection 
  if "point" is in "interval"

Reduces 2D line intersection problem
  to 1D range searching!

-----
          Sweep line implementation

Uses both PQ and ST (with range search) ADT

Consider y coordinates of endpoints
  in increasing order (use PQ)

Three types of "events"
  B: bottom of vertical line
  T: top of vertical line
  H: horizontal line

Maintain dynamic ST
 where an item with key x means that
 some vertical line intersects
 the sweep line at x    

B: INSERT x
T: DELETE x
H: RANGE (x1, x2)

Generalizes to give fast algorithm for
  * rectangles
  * general lines, circles
  * convex polygons
Generalizes to higher dimensions
  "sweep hyperplane"
  many applications
-----
          Near neighbor searching

Another possible addition to Search ADT:

--
.       void STinit();
.       void STinsert(Item x);
.       Item STsearch(Key v);
.        int STempty();
.       Item STnearest(Key v);
---

Find the record with key value closest to v

Need a concept of "distance", not just "less"
  easy if keys are numbers, or points in space

Implementations
  ARRAY: do binary search on v
    compare with points on either side
  HASH TABLE: no easy algorithm
  BST, TRIE: recursive traversal works

-----
          1D BST near neighbor searching

Test search value for proximity with every node touched

Go down the both subtrees if close to the edge

--
void BSTnear(link h)
  { 
    if (h == z) return;
    if (dist(v, h->key) < min)
      { best = h; min = dist(v, best->key); }
    if (v < h->key || (v - h->key) < min)
      BSTnear(h->l);
    if (v > h->key || (h->key - v) < min)
      BSTnear(h->r);
  }
---

Multidimensional near neighbor searching:
  same algorithm, but switch dimensions
  at each level, just as for range searching

-----
          Voronoi diagram

Given: set S of N points

VORONOI DIAGRAM: for each point x, identify
  the set of points in the plane closer to x
  than to any other point y in S

Ultimate "closest point" structure

Voronoi edges
  perpendicular bisectors of point pairs
Voronoi points
  centroids of point triple triangles

Challenge to compute 
  Representation? 
  Degenerate cases?

-----
          Delaunay triagulation

Given: set S of N points

DELAUNAY TRIANGULATION: connect x to 
  every point y in S with a Voronoi edge 
  separating x and y

Outer boundary is convex hull
Representation easier: no extra points

Voronoi diagram and Delauney triangulation
  can be computed in N log N steps (!!)

Approaches
* divide and conquer
* sweep line
* randomize
* discretize


-----
          2D divide-and-conquer

Ex: To find the closest pair in N points

* sort on x
* divide into two sets of N/2 points
* find closest pair in each half
* find closest pair crossing boundary


After pairs in halves found, checking
  boundary crossing seems easy
Actually need to sort on y (comes for free)
  to make boundary check easy
Boundary check MUST be carefully done
  because it terminates recursion

-----
          2D divide-and-conquer code

Use the same code (mergesort) to
* sort on x
* sort on y and work on boundary

Call this twice (change key from x to y):
--
struct node *sort(struct node *c, int N)
{ int i; struct node *a, *b; float middle;
  struct point p1, p2, p3, p4;
  a = c; if (c->next == z) return c;
  for (i = 2; i <= N/2; i++) c = c->next;
  b = c->next; c->next = z;
  if (pass == 2) middle = b->p.x;
  c = merge(sort(a, N/2), sort(b, N-(N/2)));
  if (pass == 2)
  {
    p1 = z->p; p2 = z->p; p3 = z->p; p4 = z->p;
    for (a = c; a != z; a = a->next)
      if (fabs(a->p.x - middle) < min)
      { check(a->p, p1); check(a->p, p2);
        check(a->p, p3); check(a->p, p4);
        p1 = p2; p2 = p3; p3 = p4; p4 = a->p; }
  }
  return c;
}
---

Same general strategy works to compute Voronoi
-----
          Grid methods

Grids : geometric search :: tries : search ADT

Grid method
* define uniform grid of fixed-size squares
* put points in lists associated with squares
* ignore points in faraway grid squares

Time-space tradeoff like MSD sort
* grid too fine: empty cells
* grid too coarse: lists too long
Use 2- or 3-level grids, or recurse ala quad trees

Ex: range searching

For graphics applications
  ultimate grid is PIXEL ARRAY
  leads to "discretized algorithms"
-----
          Point location problem

Ex: find state corresponding to point on map

Planar subdivision
* 2D tree planar decomposition
* N lines
* Voronoi diagram
* grid

Which division contains the given point?

Difficult in general
  if only because of difficulty of
  representing planar subdivisions
-----
          Discretized line intersection

p-by-p bit raster, p^2 pixels

N lines

Draw rasterized verson of line
  report intersection if pixel already 1

Cost:
  p^2 to initialize pixels to 0
  number of pixels on lines

Cost dominated by p^2

Line intersection same cost
  as drawing blank picture!

-----
          Discretized Voronoi diagram

put 1 pixels on a priority queue
priority: distance to closest point

Algorithm:
* remove pixel from priority queue
* check all neighbor pixels
  * closer or same: ignore
  * farther: check pixel value
    * if 0, set to 1 and put back on pq
    * if 1, must be on a voronoi edge!

Time proportional to 
  number of pixels on diagram
      times 
  diameter of largest cell
  (plus time to initialize pixels to 0)

Idea: 
  use or refine triangulation from
  discretized diagram to compute real
  diagram
  

          COS 226 Lecture 17: Graph algorithms

GRAPH: a set of OBJECTS with CONNECTIONS

Interesting and useful abtract concept

Study of graph algorithms
  a challenging branch of computer science

Study of mathematical properties
  a challenging branch of discrete math

Hundreds of interesting algorithms known

Important applications abound
  * operations research
  * circuit simulation
  * software systems


-----
          Glossary of terms

Node
Edge
Graph

Dense
Sparse

Path
Cycle
Tree
Spanning tree
Connected
Connected component
Directed
Undirected
Weighted
Network

-----
          Graph examples (geometric problems)


-----
          More graph examples

Anything involving relationships among items
  can be modeled as a graph

Ex: Transportation network
    Nodes: cities; Edges: roads
Ex: Electric circuit
    Nodes: devices; Edges: wires

Warning: geometric intuition may mislead
Ex: Airline fares
    (triangle equality might not hold)

Graphs are ABSTRACT objects, and model
  other abstractions

Ex: Scheduling
    Nodes: tasks
    Edges: must do task A before task B
Ex: Programming system
    Nodes: subroutine; Edges: A can call B
Ex: Finite state automaton
Ex: CFG
    Nodes: symbols; Edges: productions
Ex: Game graphs
    Nodes: board positions; Edges: moves
-----
          Representing graphs

Graph are abstract mathematical objects

Algorithms have to work with
  concrete representations

Many different representations possible

Efficiency depends on matching 
  representation with algorithm

Nothing new here:
  standard first step in algorithm design
  is to pick representation (data structure)
  * array vs. linked list
  * integers
  * points, lines, polygons
  * trees

-----
          Representing graphs (examples)

Node names
  A B C D E F G H I


Adjacency lists
  A: F C B G
  B: A
  C: A
  D: F E
  E: G F D
  F: A E D
  G: E A
  H: I
  I: H
  J: K L M
  K: J
  L: J M
  M: J L

-----
          Representing graphs (more examples)

Set of edges
  AB AG AC LM JM JL JK ED FD HI FE AF GE

Adjacency matrix
--
    A B C D E F G H I J K L M
  A 1 1 1 0 0 1 1 0 0 0 0 0 0
  B 1 1 0 0 0 1 1 0 0 0 0 0 0
  C 1 0 1 0 0 0 0 0 0 0 0 0 0
  D 0 0 0 1 1 1 0 0 0 0 0 0 0
  E 0 0 0 1 1 1 1 0 0 0 0 0 0
  F 1 0 0 1 1 1 0 0 0 0 0 0 0
  G 1 0 0 0 1 0 1 0 0 0 0 0 0
  H 0 0 0 0 0 0 0 1 1 0 0 0 0
  I 0 0 0 0 0 0 0 1 1 0 0 0 0
  J 0 0 0 0 0 0 0 0 0 1 1 1 1
  K 0 0 0 0 0 0 0 0 0 1 1 0 0
  L 0 0 0 0 0 0 0 0 0 1 0 1 1
  M 0 0 0 0 0 0 0 0 0 1 0 1 1
---

Implicit edges
  "calculate" edges only when needed

-----
          Summary of costs

E edges, V vertices

Space requirements:

  Adjacency lists: V+E

  Adjacency matrix: V^2

  Set of edges: E (+V)

Choice of representation affects 
  algorithm efficiency
Trivial examples:
  Is there an edge from A to B?
    lists: O(E)
    matrix: O(1)
  Is there an edge from A to anywhere?
    lists: O(1)
    matrix: O(E)
-----
          Basic graph problems (short list)

  Is there a path from A to B?
CYCLES
  Does the graph contain a cycle?
CONNECTIVITY (SPANNING TREE)
  Is there a way connect the vertices?
BICONNECTIVITY
  Will the graph become disconnected if
  one vertex is removed?
PLANARITY
  Is there a way to draw the graph without
  edges crossing?
SHORTEST PATH
  What is the shortest way from A to B?
LONGEST PATH
  What is the longest way from A to B?
MINIMAL SPANNING TREE
  What is the best way connect the vertices?
HAMILTONIAN CIRCUIT
  Is there a way to visit all the vertices 
  without visiting the same vertex twice?
TRAVELING SALESMAN
  What is the shortest route to connect the vertices
  without visiting the same vertex twice?
ISOMORPHISM
  Do two given representations
  represent the same abstract graph?
-----
          Traversing graphs

Goal: "visit" every node on the graph

Depth-first search (DFS)
  Recursive traversal algorithm
  * Mark all nodes as "unvisited"
  * To visit a node k
    * mark it
    * (recursively) visit all unmarked 
      nodes connected to k by an edge
  * Visit node 1, then visit any
    unvisited node, continue until
    all nodes visited

Solves some simple graph problems
  * connectivity
  * cycles
Basis for solving difficult graph problems
  * biconnectivity
  * planarity
-----
          DFS implementation (recursive)

--
int val[maxV]; int id = 0;
dfs()
  { int k;
    for (k = 1; k <= V; k++) val[k] = 0;
    for (k = 1; k <= V; k++) 
      if (val[k] == 0) visit(k);
  }
---

Adjacency matrix
--
visit(int k)
  { int t;
    val[k] = ++id;
    for (t = 1; t <= V; t++) 
      if (a[k][t] != 0)
        if (val[t] == 0) visit(t);
  }
---

Adjacency list
--
visit(int k)
  { struct node *t;
    val[k] = ++id;
    for (t = adj[k]; t != z; t = t->next)
      if (val[t->v] == 0) visit(t->v);
  }
---

-----
          DFS example (adjacency lists)


  A: F C B G
  B: A
  C: A
  D: F E
  E: G F D
  F: A E D
  G: E A
  H: I
  I: H
  J: K L M
  K: J
  L: J M
  M: J L

-----
          DFS example (adjacency lists)

visit A
  visit F (first on A's list)
    check A on F's list (been there)
    visit E (second on F's list)
      visit G (first on E's list)
        check E on G's list (been there)
        check A on G's list (been there)
      check F on E's list (been there)
      visit D (third on E's list)
        check F on D's list (been there)
        check E on D's list (been there)
    check D on F's list (done that)
  visit C (second on A's list)
  visit B (third on A's list) 
  check G on F's list (done that)
  ...
          
"been there": currently working on it
"done that": totally finished dealing with it
-----
          DFS example (adjacency matrix)


Adjacency matrix
--
    A B C D E F G H I J K L M
  A 1 1 1 0 0 1 1 0 0 0 0 0 0
  B 1 1 0 0 0 0 0 0 0 0 0 0 0
  C 1 0 1 0 0 0 0 0 0 0 0 0 0
  D 0 0 0 1 1 1 0 0 0 0 0 0 0
  E 0 0 0 1 1 1 1 0 0 0 0 0 0
  F 1 0 0 1 1 1 0 0 0 0 0 0 0
  G 1 0 0 0 1 0 1 0 0 0 0 0 0
  H 0 0 0 0 0 0 0 1 1 0 0 0 0
  I 0 0 0 0 0 0 0 1 1 0 0 0 0
  J 0 0 0 0 0 0 0 0 0 1 1 1 1
  K 0 0 0 0 0 0 0 0 0 1 1 0 0
  L 0 0 0 0 0 0 0 0 0 1 0 1 1
  M 0 0 0 0 0 0 0 0 0 1 0 1 1
---

-----
          DFS example (adjacency matrix)

visit A
  check A on A's list (been there)
  visit B (second on A's list) 
    check A on B's list (been there)
    check B on B's list (been there)
  visit C (third on A's list)
    check A on C's list (been there)
    check C on C's list (been there)
  visit F (fourth on A's list)
    check A on F's list (been there)
    visit D (second on F's list)
      check D on D's list (been there)
      visit E (second on D's list)
        check D on E's list (been there)
        check E on E's list (been there)
        check F on E's list (been there)
        visit G (fourth on E's list)
          check A (been there)
          check E (been there)
          check G (been there)
      check F on D's list (been there)
    check E on F's list (done that)
    check F on F's list (been there)
  check G on A's list (done that)
  ...

-----
          DFS implementation (stack-based)

Use explicit stack instead of recursive calls
--
int val[maxV]; int id = 0;
visit(int k)
  { struct node *t;
    push(k);
    while (!stackempty())
      {
        k = pop(); val[k] = ++id;
        for (t = adj[k]; t != z; t = t->next)
          if (val[t->v] == 0) 
            { push(t->v); val[t->v] = -1; }
      }
  }
listdfs()
  { int k;
    stackinit();
    for (k = 1; k <= V; k++) val[k] = 0;
    for (k = 1; k <= V; k++) 
      if (val[k] == 0) visit(k);
  }
---

-----
          DFS example (adjacency lists, stack-based)

visit A
  push F, push C, push B, push G
  visit G 
    push E, been to A
    visit E
      been to G, been to F, push D
      visit D
        been to F, been to E
  visit B
    been to A
  visit C
    been to A
  visit F
    been to A, done with E, done with D

-----
          Stack-based search

NOT the same as recursive DFS.  Why?

Algs differ in treatement of nodes adjacent
to partially visited nodes
  DFS: visits such a node
  stack-based: treats it as "visited"

Automatic recursion removal would stack 
  local variable (next node on adj. list)
Can get same effect by deleting (if present)
  before pushing (more general ADT than stack)

No particular reason to use stack
  other ADTs work as well (stay tuned)

Analogous to traversing a maze
-----
          Graphs and mazes

Nodes: intersections
Edges: hallways

DFS
  mark ENTRY and EXIT halls at each vertex
  leave by ENTRY when no unmarked halls

Stack-based?
  light each intersection
  if you can see a light when arriving at
    an intersection, turn back
  leave by ENTRY when no unmarked halls

What characterizes a "recursive" method??

-----
          Breadth-first search


Put unvisited nodes on a QUEUE, not a stack

--
visit(int k)
  { struct node *t;
    put(k);
    while (!queueempty())
      {
        k = get(); val[k] = ++id;
        for (t = adj[k]; t != z; t = t->next)
          if (val[t->v] == 0) 
            { put(t->v); val[t->v] = -1; }
      }
  }
bfs()
  { int k;
    queueinit();
    for (k = 1; k <= V; k++) val[k] = 0;
    for (k = 1; k <= V; k++) 
      if (val[k] == 0) visit(k);
  }
---

-----
          BFS vs DFS example

Depth-first search

Breadth-first search

DFS: "sequential"
BFS: "parallel"

-----
          DFS example

-----
          DFS example (continued)

Same graph, different order of edges
  on adjacency lists
-----
          DFS example (continued)

Same graph, "randomized" choice of
  edges on adjacency lists
          COS 226 Lecture 18: Minimal Spanning Trees

Classic algorithms for solving "network" problems

MINIMAL SPANNING TREE
  Kruskal's algorithm
  Prim's algorithm
SHORTEST PATHS
  Dijkstra's algorithm
  Floyd's algorithm

Related to generalized graph search
  using a priority queue

Ex: telephone wiring
Ex: supplies to troops

-----
          Generalized search (Priority-first)

Use PQ, rather than stack or queue

TREE vertices
  visited
FRINGE vertices
  adjacent to visited
UNSEEN vertices
  others

Put all fringe vertices on PQ

Search algorithm
  * take next vertex off PQ
  * change status to TREE
  * for each adjacent vertex
      UNSEEN: move to FRINGE
      FRINGE: update prioirity

Encompasses several basic graph algorithms
  depth-first search
    (use decreasing counter for priority)
  breadth-first search
    (use increasing counter for priority)
  minimal spanning tree
  shortest path tree

-----
          Minimal spanning tree

Weighted graph (edges have values)

Ex: Vertices: points in space
    Edges: lines connecting them
    Weights: lengths of lines

MST: minimal weight set of edges 
     that connects all the vertices

Theorem: 
  Divide the vertices into two sets A and B.
  Shortest edge X connecting a vertex in A
  with a vertex in B must be in the MST

  Proof: by contradiction.
  If X not in MST, add it to MST
    * creates a cycle
    * some other edge Y from A to B 
        must be on cyccle
  Delete Y and add X to get better tree.

-----


sample graph


a spanning tree


minimal spanning tree


-----
          Priority-first search MST algorithm

Set A: TREE
Set B: UNSEEN+FRINGE
Smallest edge in FRINGE must be in MST

--
#define P t->w
visit(link* adj, int k)
  { link t;
    dad[k] = k; PQreduce(k, 0);
    while (PQminkey() < maxNum)
      {
        k = PQdelmin();
        printf("%d-%d\n", k, dad[k]);
        for (t = adj[k]; t != z; t = t->next)
          if (onPQ(t->v) && (P < val[t->v]))
            { PQreduce(t->v, P); dad[t->v] = k; }
      }
  }
listpfs(link* adj, int V, int E)
  { int k;
    val = malloc((V+1)*sizeof(weightType));
    dad = malloc(V*sizeof(int));
    PQinit(val, V);
    for (k = 0; k < V; k++) 
      if (onPQ(k)) visit(adj, k); 
  }
---
/lines 35 def
-----
          MST example


use line lengths
as edge weights


  Edges                Adjacency lists
--
  0-6 5.3      0:  7  5  2  1  6 
  0-1 3.1      1:  7  0 
  0-2 3.6      2:  7  0 
  4-3 2.2      3:  5  4 
  5-3 2.2      4:  6  5  7  3 
  7-4 2.2      5:  0  4  3 
  5-4 3.1      6:  4  0 
  0-5 6.0      7:  1  2  0  4
  6-4 3.6
  7-0 3.6
  7-2 1.4
  7-1 1.0
---

MST
--
  1-0 7-1 2-7 4-7 3-4 5-3 6-4
--- 

-----
          PFS MST example

--
.  0 1 2 3 4 5 6 7 min out updates 
.  ---------------------------------------
.  0 * * * * * * *  0  0-0 0-1 < 0-*, etc.
.    0 0 * * 0 0 0  1  1-0 1-7 < 7-0  
.      0 * * 0 0 1  7  7-1 7-2 < 2-0, 7-4 < 4-0    
.      7 * 7 0 0    2  2-7 none  
.        * 7 0 0    4  4-7 4-6 < 6-0, 4-5 < 5-0, 4-3 < 3-*
.        4   4 4    3  3-4 3-5 < 5-4
.            3 4    5  5-3 none
.              4    6  6-4 none
---
                              

-----
          Priority queue issues

Need to implement
  DELETE MINIMUM
  UPDATE
Priority queue holds non-tree vertices
  * priority is shortest edge to tree vertex
  * for each vertex added to the tree
      each of its edges may give a shorter
      connection to a non-tree vertex
Requires handles!
  given vertex, have to change its value
Example: heap implementation
  priorities move around in heap
  INDIRECT heap: keep track of indices
  (see book for details)
--
| 9.9              | 9.9                  
|       1.4        |       3.1          
| 3.1              | 5.5              
|             1.1  |             1.4      
| 6.7              | 6.7              
|       4.2        |       4.2          
| 5.5              |               
---
Another approach
  INDEXED TOURNAMENT
  Separate array on top of data
    keep track of winners
    do not move data itself
-----
          Indexed tournament implementation

--
int *pq;
int pqV; numType* pqval;
void PQinit(numType val[], int V)
  { int i;
    pqV = V; pqval = val; pqval[V] = maxNum;
    pq = malloc((V+V)*sizeof(link));
    for (i = 0; i < V; i++) pq[V+i] = V; 
    for (i = V-1; i > 0; i--) pq[i] = pq[i+i];
  }
int onPQ(int v)
  { return pq[pqV+v] == v; }
numType PQminkey()
  { return val[pq[1]]; }
void promote(int k)
  { int j, l, r;
    pq[pqV+k] = k;
    for (j = (pqV+k)/2; j > 0; j = j/2)
      if (val[pq[j+j]] < val[pq[j+j+1]]) 
           pq[j] = pq[j+j];
      else pq[j] = pq[j+j+1];
  }
int PQdelmin()
  { int k = pq[1]; 
    pq[pqV+k] = pqV; promote(k); return k; }  
PQreduce(int v, numType x)
  { val[v] = x; promote(v); }  
---

/lines 56 def
-----
          Indexed tournament example


--
| 0.0 0              |     8              
|         0          |         1          
| *** 8              | 3.1 1              
|             0      |             1      
| *** 8              | 3.6 2              
|         8          |         2          
| *** 8              | *** 8              
|                0   |                1   
| *** 8              | *** 8              
|         8          |         5          
| *** 8              | 6.0 5              
|             8      |             7      
| *** 8              | 5.3 6              
|         8          |         7          
| *** 8              | 3.6 7              


|     8              |     8              
|         8          |         0          
|     8              |     8              
|             2      |             2      
| 3.6 2              | 1.4 2              
|         2          |         2          
| *** 8              | *** 8              
|                7   |                2   
| *** 8              | 2.2 4              
|         5          |         4          
| 6.0 5              | 6.0 5              
|             7      |             4      
| 5.3 6              | 5.3 6              
|         7          |         6          
| 1.0 7              |     8              


|     8              |     8              
|         8          |         8          
|     8              |     8              
|             8      |             3      
|     8              |     8              
|         8          |         3          
| *** 8              | 2.2 3              
|                4   |                3   
| 2.2 4              |     8              
|         4          |         5          
| 6.0 5              | 3.1 5              
|             4      |             5      
| 5.3 6              | 3.6 6              
|         6          |         6          
| *** 8              | *** 8              

---


-----
          Larger PFS MST example


-----
          Prim's algorithm

Use adjacency matrix
While going through the matrix
  to find the vertices connected to the current one,
  find the minimum at the same time

val[k]
  neg: FRINGE; pos: TREE; -unseen: UNSEEN

--
matrixpfs()
{ int k, t, min = 0;
  for (k = 1; k <= V; k++) 
    { val[k] = -unseen; dad[k] = 0; }
  val[0] = -(unseen+1); // sentinel
  for (k = 1; k != 0; k = min, min = 0)
    {
      val[k] = -val[k];
      if (val[k] == unseen) val[k] = 0;
      for (t = 1; t <= V; t++)
        if (val[t] < 0)
          {
            if (a[k][t] && (val[t] < -priority))
              { val[t] = -priority; dad[t] = k; }
            if (val[t] > val[min]) min = t;
          }
      }
}
---

Invented in 1957  

-----
          Prim's algorithm vs. PFS

Depends on whether graph is DENSE or SPARSE

To within a constant factor
  PFS:    (E + V) log V
  Prim's  V*V

Ex: 2 thousand vertices, 1 million edges
  PFS: 11 million steps
  Prims's: 4 million steps

Ex: 1 million vertices, 2 million edges
  PFS: 60 million steps
  Prims's: 1 trillion steps

Ex: 100 thousand vertices, 1 million edges
  PFS: 17 million steps
  Prim's: 100 billion steps

Clever algorithms can reduce PFS to
  E + V log V

Bottom line: 
  * Prim's optimal for dense graphs
  * don't use Prim's for sparse graphs
    (space might be prohibitive, anyway)

-----
          Kruskal's algorithm

UF algorithms from lecture 1 find a spanning tree

THEOREM:
  If the edges come in order of their length,
  the spanning tree from UF algs is a MST!

Algorithm: 
  * sort the edges
  * use weighted quick union

In Postscript:
--
/kruskal
  { 
    ufwinit
    0 2 E E add 1 sub
      { /i exch def
        edges i get /v1 exch def 
        edges i 1 add get /v2 exch def
        v1 v2 2 copy ufw 
          { pop pop }{ drawedge } ifelse 
      } for
  } def
---

Invented in 1956.

-----
          Kruskal MST example

Take edges in order, skip those that make cycles

--
  7-1 1.0 *
  7-2 1.4 *
  4-3 2.2 *
  5-3 2.2 *
  7-4 2.2 *
  5-4 3.1
  0-1 3.1 *
  0-2 3.6
  6-4 3.6 *
  7-0 3.6
  0-6 5.3
  0-5 6.0
---


-----
          Larger Kruskal example


-----     
          Kruskal's algorithm vs PFS

Depends on structure of graph

To within a constant factor
  PFS:       E log V + V log V
  fancy PFS: E + V log V
  Kruskal:   E + M log E
where M is the number edges in the graph 
  shorter than the longest edge in the MST

Do Kruskal with heap built from the bottom up
  otherwise E log E for the sort

Performance differences depend upon:
  * dense vs. sparse
  * structure of graph
  * priority queue algorithm
  * priority queue implementation 

All require space for whole graph
  (UF finds a spanning tree in O(V) space)
  MST for huge graphs??
          COS 226 Lecture 19: Shortest paths

Classic algorithms for solving "network" problems

MINIMAL SPANNING TREE
  Kruskal's algorithm
  Prim's algorithm
SHORTEST PATHS
  Dijkstra's algorithm
  Floyd's algorithm

Related to generalized graph search
  using a priority queue

Ex: telephone wiring
Ex: supplies to troops

-----
          Generalized search (Priority-first)

Use PQ, rather than stack or queue

TREE vertices
  visited
FRINGE vertices
  adjacent to visited
UNSEEN vertices
  others

Put all fringe vertices on PQ

Search algorithm
  * take next vertex off PQ
  * change status to TREE
  * for each adjacent vertex
      UNSEEN: move to FRINGE
      FRINGE: update prioirity

Encompasses several basic graph algorithms
  depth-first search
    (use decreasing counter for priority)
  breadth-first search
    (use increasing counter for priority)
  minimal spanning tree
  shortest path tree

-----
          Shortest paths

SINGLE-SOURCE shortest paths:
  shortest path from x to each other node

Algorithm: PFS starting at x

val[k]:
  shortest distance from x to k
priority on FRINGE: 
  shortest known distance to node

smallest fringe value:
  shortest distance to *any* node not on tree
  (won't find a shorter path)

--
#define priority val[k]+t->w
---

DFS and BFS also immediate from PFS
  BFS: priority is "time since start"
  DFS: priority is "time to finish"

-----
          PFS SPT example


-----
          PFS BFS and DFS examples


-----
          Shortest paths in Euclidean graphs

Problem:
  find shortest path from x to y

Algorithm: 
  start shortest-path PFS at x
  stop when reaching y

val[k]: 
  shortest distance from x to k in TREE
  PLUS distance from k to y as the crow flies
priority on FRINGE: 
  shortest possible way from x to y

--
#define priority 
  val[k]+t->w + distance(t_>w, y)
---

-----
          All shortest paths

Problem:
  Given weighted graph, compute
    a table giving the shortest weighted
    path from each node to each other node

Ex: map of New England
    nodes: cities
    edges: roads connecting cities

--
.               P     W     L     N           
. Providence    0    53    54    48
. Westerley    53     0    18   101
. newLondon    54    18     0    12
. Norwich      48   101    12     0
---

Norwich-Westerly: 101 miles??
  12 miles Norwich-New London 
  18 miles New London-Westerly
  ---------
  30 miles

-----
          Floyd-Warshall algorithm

Another ancient algorithm (1962)

Want shorter path from x to j?
  take x to y, then y to j, if shorter

For each x-y pair, do all j

--
for (y = 1; y <= V; y++)
  for (x = 1; x <= V; x++) 
    for (j = 1; j <= V; j++) 
      if (a[x][y] + a[y][j] < a[x][j])
         a[x][j] = a[x][y] + a[y][j];
---

Correctness proof:
  induction on y

Running time: O(V^3)

Negative edge weights?
  * much more difficult to solve
  * could get negative cycles

          COS 226 Lecture 20: Network Flow

Advanced graph problems
  NETWORK FLOW
  MATCHING

OPERATIONS RESEARCH 

Classical problems (1940s)
  Graph algorithm technology
  PQ and data structure design crucial
  Optimal solutions still not known

Examples of REDUCTION
  BIPARTITE MATCHING
    no harder than network flow 
    (actually, much less general)
  LINEAR PROGRAMMING
    no easier than network flow
    (actually, much more general)

Network flow, matching, linear programming
  are each fields of study in their own right

-----
          Network flow

NETWORK: weighted graph

Consider network as abstraction for material
  FLOWING through the edges

Interpret weights as CAPACITIES
  nodes as FLOW CONTROL SWITCHES
Ex: oil flowing in pipes
Ex: commodities flowing on roads and rails
Ex: bits flowing in Internet

Identify
  SOURCE: node where all material originates
  SINK: node where all material goes

MAX FLOW PROBLEM: assign flows to edges that
  maximize total flow through the network


  dumb         naive          best
-----
          Increasing flow in a network

Assume directed graph: all links one-way
  (downhill in examples)

AUGMENTING PATH:
  path from source to sink with unused
    capacity on every edge

Easy case
  all edges on path carry positive flow
  (up to capacity)

More complicated case
  negative flow in BACKWARD edges 
  (up to amount of flow)
  

  ABDF         ACEF         ACDBEF
-----
          Ford-Fulkerson algorithm

Start with 0 flow everywhere
Increase the flow along any augmenting path
  until no augmenting paths are left

Problem 0:
  Does this lead to the maximum flow?

Problem 1: 
  Not really an algorithm, since 
  many details are unspecified
  * How do we find an augmenting path?
  * Which augmenting path is best?

Problem 2:
  Cost can be proportional to max capacity

Ex:
  ABCD
  ACBD
  ABCD
  ACBD
  ABCD
...

-----
          Max flow/Min cut theorem

CUT: set of edges separating source from sink

THM: Finding the maximum flow is equivalent to
     finding the minimum cut
Proof: Omitted


THM: Ford-Fulkerson method gives maximum flow
Proof sketch:
  If there is no augmenting path,
    identify the first full forward 
    or empty backward edge on every path
  That set of edges defines a min cut

Applies to an entire family of algorithms
  No matter how the augmenting paths 
  are chosen, the flow is maximal 
  when no more can be found

Algorithm design goals
* find paths quickly
* use as few iterations as possible

-----
          Edmonds-Karp algorithms

Idea 1:
  use BFS to find augmenting path

Idea 2:
  find path that increases the flow the most

BOTH easy to implement with PQ graph search (!)

Change basic graph search routine to
* use positive weights for forward path, 
  negative weights for backward path
* ignore full forward, empty backward edges
* take source, sink as args, return 0 if no path

--
for (;;)
  {
    if (!matrixpfs(1,V)) break;
    y = V; x = dad[V];
    while (x != 0)
      {
        flow[x][y] = flow[x][y]+val[V];
        flow[y][x] = -flow[x][y];
        y = x; x = dad[y];
      }
  }
---
-----
          Network flow example


-----
          Analysis of network flow algorithms

Edmonds-Karp worst-case running time
  BFS: V E^2
    (VE iterations, each cost E)
  best path: C(M, F) E log V
    depends on cut size and flow amount
    (M/(M-1))^C = F   
  Ex: M = 1000, F = 1000: C = 6900
  Ex: M = 10, F = 1000: C = 72

New algorithms
  * use better priority queue implementations
  * remember facts about search for last path
    to find new path
  * find multiple paths on one search
Running times
  1970  V^2 E
  1977  V^2 E^(1/2)
  1978  V^3
  1978  V^(5/3) E^(2/3)
  1980  V E log V
  1986  V E log(V^2/E)
Objection!
  real graphs not "average" or "worst case"
  simple (but not dumb) algorithms may be
    preferred in practice
BUT simple optimal algorithm could still exist
-----
          Matching

MATCHING
  set of edges with no vertex included twice

MAXIMUM MATCHING
  no matching contains more edges

WEIGHTED MATCHING
  weighted graph
  no matching has greater sum of edge weights

BIPARTITE GRAPH
  two types of nodes
  all edges connect nodes of differing type

BIPARTITE MATCHING
  maximum matching in bipartite graph

Ex:
  graph: E5 A2 A1 C1 B4 C3 D3 B2 A4 D5 E3 B1
  matching: A1 B4 C3 D5 E6 F2

ASSIGNMENT PROBLEM
  weighted bipartite matching

Matching is difficult to solve in general
Related to network flow
-----
          Matching examples


Matching


Bipartite graph
-----
          Reducing A to B

SOLUTION: efficient algorithm that can
  give the answer for any problem instance

Ex: Given any network, Edmonds-Karp algorithms
  solve network flow problem for that network

REDUCTION: a way to solve problems by converting
  them to other problems
Ex: Bipartite matching reduces to network flow
  Convert instance of bipartite matching to
    instance of network flow
  Solve network flow instance
  Convert network flow solution to bipartite
    matching solution
Ex: Network flow reduces to linear programming
  (stay tuned)

Reduction is not necessarily EFFICIENT
  Ex: Everything computable reduces to 
    Universal Turing Machine
BUT reduction, in practice,
* is better than brute force
* is better than investing years of research
  to understand details of a problem
* gives insight into difficulty
  (partially order problems by difficulty)
-----
              Reducing bipartite matching to network flow

Create two new nodes
  SOURCE with edges to all nodes of first type
  SINK with edges to all nodes of second type

Make all capacities 1


Network flow easier for unit capacities

NOTE: Reduction doesn't work for weighted case
  (assignment problem)

-----
          Linear programming


Maximize OBJECTIVE FUNCTION 
subject to CONSTRAINTS

Ex:
  maximize x+y
  subject to the constraints
    -x + y  <= 5
    x + 4y <= 45
    2x + y  <= 27
    3x - 4y <= 24
      x, y  >= 0

Vast number of applications

Early application: DIET problem
  min cost diet with adequate nutrients

"Programming": formulate the equations
  (reduce given problem to linear programming)
-----
          Reducing network flow to linear programming

Express capacities as inequalities
  flow at nodes as equalities

Ex:
  maximize AB + AD
  subject to the constraints
    AB <= 6  
    CD <= 3  
    AC <= 8  
    CE <= 3
    BD <= 6  
    DF <= 8   
    BE <= 3  
    EF <= 6
    BD + BE = AB  
    CD + CE = AC
    BD + CD = DF  
    BE + CE = EF
    all >= 0    
-----
          Geometric interpretation

Ex: (two variables)
objective function maximized at vertex

Ex: (three variables)

Higher dimensions
  multifacted convex polytope: SIMPLEX
  objective function maximized at vertex
-----
          Approaches to linear programming

Simplex method
  classical approach
  like Gaussian elimination
  pivot steps correspond to moving on 
    simplex vertices
  exponential in worst case
  fast in practice

Duality
  is point inside simplex?

Ellipsoid methods
  new algorithmic approach
  geometric divide-and-conquer
  polynomial time (!)

Challenge:
  make new methods faster than simplex
  for practical problems

-----
          Perspective

Nearing the deep waters of NP-completeness
Ex: INTEGER programming NP complete
Ex: TRIpartite matching NP-complete

  1960s Operations research
  1970s Analysis of algorithms
  1980s Geometric algorithms

All relevant to wide variety of 
important practical problems

Consider ALL approaches for new problems

Reduction
  may help determine complexity
  may provide usable solution
    for practical instances
Specialized algorithms
  may provide elegant optimal solution
    for all instances
General methods
  may solve specific practical instance 
    quickly (even if can't be proven fast)


          COS 226 Lecture 21: Multiplication

How fast can we multiply?

Small numbers: basic machine operation
Approximate solution OK: "floating point"
  MIPS vs. FLOPS
Exact solution required for numbers
  with millions of digits?

small ex:
  98734019238470981723409182730491827
    times
  66753875434605462654864811227186687
    equals
  6590878421402831216001745213641931592836606299109
      07873145346994326436

Basic question about difficulty of computations

ARITHMETIC COMPLEXITY
  also consider other operations
  add, multiply, exponentiate, divide, logarithm ...

Applications more numerous than we might imagine
  Arithmetic coding ($%#?!)
  Cryptography
  Signal processing  
-----
          Multiplying complex numbers

Given two complex numbers
  x = a + b i
  y = c + d i

Product x*y 
  real part:      a*c - b*d 
  imaginary part: a*d + b*c
Four multiplications, two additions

Typical old-fashioned machine: 
 1 time unit for add
 4 time units for multiply
 therefore, 18 time units for complex multiply

Faster way to compute product (!)
  t = (a+b) * (c+d) = a*c + a*d + b*c + b*d
Product x*y 
  real part:      a*c - b*d 
  imaginary part: t - a*c - b*d 

THREE multiplications, five additions
  17 time units for complex multiply

Only works if multiply expensive relative to add
  software floating point
  old machines
-----
          Exponentiation

How many multiplications to compute x^N ?

If N is a power of 2, use successive squaring
  1, x, x^2, x^4, x^8, x^16, x^32, ...

If N is not a power of 2, do successive squaring
  then pick out numbers to multiply together by
  * divide-and-conquer
  * binary representation of N

Ex:  x = 2, N = 21
--
  1 2 4 8 16 32 64 128 256 512 1024 2048 4096
  8192 16384 32768 65536
---
binary rep. of N:   2^21 = 2^16*2^4*2^1 = 65536*16*2 = 
divide-and-conquer: 2^21 = 2^10*2^11 = 2^5*2*5*2*5*2^6 = ...

Unrealistic to consider if x and/or N large

Realistic version (which is of practical use)
  THM: Less than 2 lg N multiplications
  are required to compute x^N mod M
  
Linear in the number of bits to store N

DISCRETE LOGARITHM problem (difficult)
  given x^N mod M, find x
-----
          Multiplying polynomials

To multiply integers
  may as well consider polynomial multiplication
  finish up with Horner's method

grade school            grade school, no carries
--
.       1 2 3 4                   1   2   3   4
.       5 6 7 8                   5   6   7   8
.       -------                  ---------------
.       9 8 7 2                  8*1 8*2 8*3 8*4
.     8 6 3 8                7*1 7*2 7*3 7*4
.   7 4 0 4              6*1 6*2 6*3 6*4
. 6 1 7 0            5*1 5*2 5*3 5*4
. -------------      ---------------------------
. 7 0 0 6 6 5 2
---

polynomial
--
.                     3     2     1     0
.                   1x  + 2x  + 3x  + 4x 
.                     3     2     1     0
.                   5x  + 6x  + 7x  + 8x  
.                   ----------------------
.                     3     2     1     0
.                   8x  +16x  +24x  +32x  
.               4     3     2     1
.             7x  +14x  +21x  +28x  
.         5     4     3     2
.       6x  +12x  +18x  +24x  
.   6     5     4     3
. 5x  +10x  +15x  +20x  
. ----------------------------------------
.   6    5     4     3     2     1     0
. 5x +16x  +34x  +60x  +61x  +52x  +32x  
---

-----
          Cost of multiplication

Polynomial multiplication
--
  for (i = 0; i < 2*N-1; i++) r[i] = 0;
  for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
      r[i+j] += p[i]*q[j];
---
  N^2 coefficient multiplications

Integer multiplication
R-L:
--
.       32
.      52
.     61
.    60
.   34
.  16
.  5
.  7006652
---
L-R:  use Horner's method to evaluate at x=10
--
.       5*10 =      50+16 = 66
.      66*10 =     660+34 = 6944
.     694*10 =    6940+60 = 7000
.    7000*10 =   70000+61 = 70061
.   70061*10 =  700610+52 = 700662
.  700662*10 = 7006620+32 = 7006652
---

Extra factor of log N might be required

Bottom line: quadratic algorithms
  can multiply thousand-digit numbers
    but not million-digit numbers

-----
          Divide-conquer polynomial multiplication

Example
--
.    2                        2
. ( x ( x+2) + (3x+4) ) * (  x (5x+6) + (7x+8) )

.   4
.  x ( x+2)*(5x+6)
.               2
.            + x ( x+2)*(7x+8)
.               2
.            + x (5x+6)*(3x+4)
.                       + (3x+4)*(7x+8)
.   6     5     4
. 5x  +16x  +12x
.               4     3     2
.             7x  +22x  +16x 
.               4     3     2
.            15x  +38x  +24x  
.                           2
.                        21x  +52x  +32
. ----------------------------------------
.   6     5     4     3     2 
. 5x  +16x  +34x  +60x  +61x  +52x  +32
---

In general, if p and q are degree 2N
--
.                N                N
.  p(x)*q(x) = (x  ph(x)+pl(x))*(x  qh(x)+ql(x)) 
.               2N
.  p(x)*q(x) = x   ph(x)*qh(x) 
.                   N
.                + x (ph(x)*ql(x) + pl(x)*qh(x)) 
.                + pl(x)*ql(x)
---
Scalar multiplication count
  m(0) = 1
 m(2N) = 4*m(N)
Solution: m(N) = N^2, since (2N)^2 = 4N^2

No savings from divide and conquer?
-----
          Polynomial multiplication (continued)

Use same technique as for complex numbers
In general, if p and q are degree 2N
--
.                N                N
.  p(x)*q(x) = (x  ph(x)+pl(x))*(x  qh(x)+ql(x)) 
.       t(x) = (ph(x) + pl(x)) * (qh(x) + ql(x)) 
.               2N
.  p(x)*q(x) = x   ph(x)*qh(x) 
.               N
.            + x (t(x)-ph(x)*qh(x) + pl(x)*ql(x)) 
.            + pl(x)*ql(x)
---
Ex:
--
.    2                        2
. ( x ( x+2) + (3x+4) ) * (  x (5x+6) + (7x+8) )
.                             2
.        ( x+2)*( 5x+ 6) =  5x + 16x + 12
.                             2
.        (3x+4)*( 7x+ 8) = 21x + 52x + 32
.                             2
. t(x) = (6x+8)*(10x+12) = 60x + 152x + 96
.   4    2
.  x ( 5x + 16x + 12)
.               2    2
.            + x (34x +84x+52)
.                            2
.                       + 21x + 52x + 32
. ----------------------------------------
.   6     5     4     3     2 
. 5x  +16x  +34x  +60x  +61x  + 52x + 32
---
Scalar multiplication count
  m(0) = 1
 m(2N) = 3*m(N)
Solution: m(N) = N^lg3, since (2N)^lg3 = 3N^lg3

Bottom line: 
  can do tens of thousands of digits
  but not millions of digits
-----
          Basic computations on polynomials

A polynomial of degree N-1
          2       3               N-1 
  p0  + p1 x + p2 x  + p3 x  + ... + p(N-1) x 
is defined by N numbers (the coefficients)
  p0 p1 p2 ... p(N-1)

The polynomial associates any N numbers
  x0 x1 x2 ... x(N-1)
with another set of N numbers
  y0 y1 y2 ... y(N-1)
simply by
       p(x0) = y0
       p(x1) = y1
       p(x2) = y2
       p(x3) = y3
          ...
 
EVALUATION
  find the y's, given the p's and x's

ROOT EXTRACTION
  find the x's, given the p's and y's

INTERPOLATION
  find the p's, given the x's and y's

-----
          Fast polynomial multiplication

EVALUATE-MULTIPLY-INTERPOLATE

                     2 
Ex:  (1+x)*(2+3x) = 2 + 5x + 3x

Pick three distinct numbers: 0, 1, 2 (say)

EVALUATE
--
.             x       0    1    2
.          -----------------------
.           1 +  x    1    2    3
.           2 + 3x    2    5    8
---
MULTIPLY
--
.         x           0    1    2
. -------------------------------
.  (1 + x)*(2 + 3x)   2   10   24
---
INTERPOLATE
--
.    (x-1)(x-2)      (x-0)(x-2)      (x-0)(x-1)
.  2 ---------- + 10 ---------- + 24 ---------
.    (0-1)(0-2)      (1-0)(1-2)      (2-0)(2-1)
.   2                  2               2
  x   - 3x + 2  -10 (x   - 2x) + 12 (x   - x)
.        2 
.      3x   + 5x + 2
---
Bad news 
* evaluate, interpolate, seem to take N^2 steps
Good news
* method works for ANY set of distinct points
-----
          Complex roots of unity

Ex:
  (1-i)(1-i) = 1 -2i -1 = -2i
       (1-i)^4  = 4
   (.707.. - .707..i)^4 = 1

For each N, there are N different complex
  numbers that evaluate to N when raised
  to the Nth power

--
.   0    1    2    3    4    5    6    7  
.  W    W    W    W    W    W    W    W
.   8    8    8    8    8    8    8    8
---
.                             5
  1   w              -1   w

w: PRINCIPAL Nth root of unity

.      k    (2 pi i k)/N
.    W  = e
.      N

x
--
.   0    1    2    3    4    5    6    7  
.  W    W    W    W    W    W    W    W
.   8    8    8    8    8    8    8    8
---
--
.   0    1    2    3    0    1    2    3  
.  W    W    W    W   -W   -W   -W   -W
.   8    8    8    8    8    8    8    8
---
x^2
--
.   0    1    2    3    0    1    2    3  
.  W    W    W    W    W    W    W    W
.   4    4    4    4    4    4    4    4
---

-----
          EVALUATE in NlogN steps 
      
p(x)
--
.         1     2     3     4     5     6     7 
. p  + p x + p x + p x + p x + p x + p x + p x
.  0    1     2     3     4     5     6     7 
---
split into even, odd coefficients
--
.         2     4     6            2     4     6 
. p  + p x + p x + p x + x(p  + p x + p x + p x )
.  0    2     4     6       1    3     5     7 
---
.            2         2
.  p(x) = p (x ) + xp (x )
.          e        o

To evaluate at eighth roots of unity
  use values of even, odd polynomials 
  at fourth roots of unity
--
.      0        0     0    0
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      1        1     1    1
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      2        2     2    2
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      3        3     3    3
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      4        0     4    0
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      5        1     5    1
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      6        2     6    2
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
.      7        3     7    3
.   p(w ) = p (w ) + w p (w )
.      8     e  4     8 o  4
---

multiplication count
  m(0) = 1
  m(N) = 2 m(N/2) + N
  m(N) = N lg N
-----
          EVALUATE implementation

Use the perfect shuffle

--
.   p    p    p    p    p    p    p    p
.    0    1    2    3    4    5    6    7
---

--
.   p    p    p    p    p    p    p    p
.    0    2    4    6    1    3    5    7
---

To evaluate inplace
  unshuffle
  evaluate halves inplace recursively
  use values to compute result

-----
          Interpolating at roots of unity

For the Nth roots of unity, it turns out
  that INTERPOLATE is truly the inverse
  of EVALUATE

r(x)
--
.         1     2     3     4     5     6     7 
. r  + r x + r x + r x + r x + r x + r x + r x
.  0    1     2     3     4     5     6     7 
---
s(x)
--
.         1     2     3     4     5     6     7 
. s  + s x + s x + s x + s x + s x + s x + s x
.  0    1     2     3     4     5     6     7 
---
--
.      0                     0      
.   r(w ) = s             s(w  ) = r 
.      8     0               8      0
.      1                     -1      
.   r(w ) = s             s(w  ) = r 
.      8     1               8      1
.      2                     -2      

.   r(w ) = s             s(w  ) = r 
.      8     2               8      2
.      3                     -3      
.   r(w ) = s             s(w  ) = r 
.      8     3               8      3
.      4                     -4      
.   r(w ) = s             s(w  ) = r 
.      8     4               8      4
.      5                     -5      
.   r(w ) = s             s(w  ) = r 
.      8     5               8      5
.      6                     -6      
.   r(w ) = s             s(w  ) = r 
.      8     6               8      6
.      7                     -7      
.   r(w ) = s             s(w  ) = r 
.      8     7               8      7
---

Fundamental property (Fourier inversion theorem)

To INTERPOLATE at roots of unity, 
  use the EVALUATE routine (!)

-----
          Fast Fourier transform


To multiply to polynomials of degree N-1

EVALUATE both at (2N-1) roots of unity
MULTIPLY (2N-1) results term-by-term
INTERPOLATE by evaluating at (2N-1) 
  inverse roots of unity

Total time proportional to N lg N
Shuffling keeps space proportional to N

Bottom line: 
  can multiply numbers with millions of digits
  "multiply" = "convolution"
  basic mathematical operation on series
  basic operation in digital signal processing
    allows "real time" processing

          COS 226 Lecture 22: Cryptology

Cryptology:  Cryptography + Cryptanalysis

Protocols and algorithms
* for sending and receiving secret messages
* for reading someone else's secret messages
Traditional applications: military
Modern applications: commercial

Never can "prove" security: too much can go wrong

Role of efficient algorithms
* practical to encode messages?
* implement codebreaking attacks
* (new meaning to "easy", "hard")

Parallel universe (primarily secret until 1980s)
Old world:  
  good guys and bad guys
  generated $$ for supercomputers
  machine code and machines
  codes and attacks
New world:  
  everyone (Alice, Bob, and Eve)
  generates venture capital $$
  C++ and Java
  protocols

Refs: 
  Codebreakers, D. Kahn
  Applied Cryptography, B. Schneier
-----
          Unbreakable cryptosystem

ONE-TIME PAD

PROTOCOL
* Alice and Bob exchange keys by messenger
    (time passes)
* Bob encrypts message with key (ciphertext)
* Bob sends ciphertext to Alice
* Alice decrypts ciphertext with key
* Bob and Alice destroy key
Eve can read ciphertext, but not message
Ex:
--
  A T T A C K   A T   D A W N
---
encrypt with random key
--
  A T T A C K   A T   D A W N   message
  M A K G A U E B H M C V V M   add key
  N S D H D E E C A M G W R     ciphertext
---
send this message
--
  N S D H D E E C A M G W R 
---
decrypt with same key (communicated previously)
--
  N S D H D E E C A M G W R     ciphertext
  M A K G A U E B H M C V V M   subtract key
  A T T A C K   A T   D A W N   message
---

PROBLEMS
* need random key as long as message
* need secure protocol to distribute key
-----
          Linear feedback shift registers

Solution to key length problem
Small key, very large number of random bits

Register with bit values at time T+1
  determined from values at time T

Ex: 11-bit register


--
 T     9    8    7  ...   1      0
T+1   10    9    8        2    10+3 (XOR)
---

--
  0 1 1 0 1 0 0 0 0 1 0  initial value
  1 1 0 1 0 0 0 0 1 0 0
  1 0 1 0 0 0 0 1 0 0 1
  0 1 0 0 0 0 1 0 0 1 0
  1 0 0 0 0 1 0 0 1 0 0
  0 0 0 0 1 0 0 1 0 0 1
  0 0 0 1 0 0 1 0 0 1 1
  0 0 1 0 0 1 0 0 1 1 0
  0 1 0 0 1 0 0 1 1 0 0
  1 0 0 1 0 0 1 1 0 0 1
  0 0 1 0 0 1 1 0 0 1 0
---
Bit 0 values comprise pseudo-random bitstream
N-bit initial value gives sequence of length 2^N
  provided bit taps are "primitive"
-----
          Stream cipher example

To transmit the message "SEND FOOD"

ENCODE with S = 10011, E = 00101, N = 01110, etc.
--
.     S    E    N    D         F    O    O    D
. 100100010101110001000000000110011100111000100
---

ENCRYPT with key from LFBSR 
message
--
. 100100010101100001000000000110011100111000100
---
key
--
. 001001100100001101010100001111010100011100101
---
ciphertext
--
. 101101110001101100010100001001001000100100001
---

TRANSMIT ciphertext
--
.     V    *    M    Q    H    I    D    I    A
---

DECRYPT with keystream from same initial value
ciphertext
--
. 101101110001101100010100001001001000100100001
---
key
--
. 001001100100001101010100001111010100011100101
---
message
--
. 100100010101100001000000000110011100111000100
---
decode
--
.     S    E    N    D         F    O    O    D
---
-----
          Digression: random number generators

Can't generate random numbers on a computer

von Neumann:
  "anyone who considers arithmetical methods
  of producing random digits is, of course, 
  in a state of sin"

Linear congruential generators

Additive congruential generators

True random bits hard to find
  clock
  time between keystrokes
  cosmic rays
Pseudo-random more convenient for keystream

Cheap source of random bits:
  some encrypted message?
-----
          Simple attack on LFBSR

If cryptanalyst knows part of the message,
  the entire keystream amy be available

Ex: operator puts two spaces at the beginning
--
.               S    E    N    D         F    O  
. 000000000010010001010111000100000000011001110
---
encrypts with key from LFBSR 
--
. 000000000010010001010111000100000000011001110
. 001001100100001101010100001111010100011100101
. 001001100110011100000011001011010100000101011
---
transmits this ciphertext
--
. 001001100110011100000011001011010100000101011
.     D    Y    S    P    F    K    J    A    K
---
First 10 bits of ciphertext are the key!
Any 11 bits of key (plus a machine) 
  gives ALL the key bits

Cryptanalyst 
* knows from spies that machine is 11-bit LFBSR
* guesses that first two chars might be spaces
* tries both possibilities for 11th initial bit
* generates key
* reads message
-----
          WW-II ENIGMA

Typewriter-like machine
Symmetric code 
  type A, it prints G
  type G, it prints A
Particular code depends on machine settings
KEY DISTRIBUTION: "codebook"

Sender types message, machine prints ciphertext
Ciphertext transmitted in the clear
Receiver types ciphertext, machine prints message

Typewriter-like technology + rotors and plugboards
Over 137629917937512000 (10^17) different settings
Thousands of machines, millions of messages

Good guys broke the code (and won the war) by
  * getting some machines
  * getting some rotors
  * getting a codebook
  * figuring out what some messages were
  * building a computer to try 
    remaining possibilities

One of the good guys: Alan Turing (!)

Ref: "Alan Turing, the Enigma" A. Hodges
-----
          Modern applications

Widespread application of computing
  creates opportunities
New research in cryptography
  provides necessary technology

* ecommerce
* communication among citizens
* better military systems

Challenges: 
  (social and political issues)
  key distribution
  efficient encode/decode


-----
          One-way trapdoor functions

Concept that opened the door to 
  modern cryptography (Diffie-Hellman, 1975)

One-way functions
  easy to compute
  difficult to compute inverse
Ex: write message on mirror, smash mirror
  
One-way trapdoor functions
  easy to compute
  difficult to compute inverse
  easy to compute inverse with key
Ex: put message in locked mailbox
  person with key can get message

Key to cryptography is to find 
  * good one-way functions
  * trapdoors 

FACTORING
  p*q = N: easy
  find p given N: difficult
  find p given q, N: easy  
DISCRETE LOG
  x^t mod p = M: easy
  find t given x, p, M: difficult 

-----
          Diffie-Hellman key exchange

Publicly available keys: 
  large integers N and g (100 digits)

Alice chooses random x and computes 
  X = g^x mod N
Bob chooses random y and computes
  Y = g^y mod N
Alice and Bob exchange X and Y
  (but keep x and y secret)
Alice computes
  Y^x = g^(xy) mod N = K
Bob computes
  X^y = g^(yx) mod N = K

Computations all "easy"
  hundreds of 100-digit multiplications

Both Alice and Bob have K
Eve needs to solve discrete logarithm 
  on 100 digit numbers to get K

-----
          Public-key cryptosystem

Every registered user is issued
  P: public (encryption) key
  S: secret (decryption) key

Public keys are published (phone book)
User responsible for keeping private key secret

PROTOCOL
* Alice computes C = P(M) using Bob's public key
* Alice transmits ciphertext to Bob
* Bob computes M = S(C) using his private key
Eve can read ciphertext, but not message
  (without Bob's private key)

Works if all (S, P) pairs satisfy
  1. S(P(M)) = M for every message M 
  2. All (S, P) pairs are distinct
  3. Deriving S from P is as hard as reading M
  4. Both S and P are easy to compute

  1. and 2. easily arranged
  3. and 4. hard to design

-----
          RSA encryption/decryption

Every registered user gets
  encryption key P: two integers N and p
  decryption key S: N and a third integer s

Assume all numbers 100s of digits

Encode the message as a number
Break into pieces smaller than N 
  (less than lgN bits)

Alice computes C = P(M) using Bob's public key
  by computing M^p mod N
Bob computes M = S(C) using his private key
  by computing C^s mod N

N, p, and s chosen to make the system work
as follows:

  * generate three large primes s, x, y
  * take N = x*y
  * choose p so that
      ps mod (x-1)(y-1) = 1


-----
          RSA example

Suppose  x = 47  y = 79  s = 97

Then compute N = 3713 and p = 37

--
  A T T A C K   A T   D A W N
---
ENCODE: A = 01, T = 20, C = 03, K = 11, etc.
--
.  A T T A C K   A T   D A W N
. 0120200103110001200004012314
---
ENCRYPT using public key 37
  0120^37 = 1040, 2001^37 = 2932, (mod 3713) ...
--
. 0120 2001 0311 0001 2000 0401 2314
. 1404 2932 3536 0001 3284 2280 2235
---

TRANSMIT ciphertext
--
.  1404293235360001328422802235
.   N D * * * *   A * * V * V *
---

DECRYPT using secret key 97
  1404^97 = 1040, 2001^97 = 2932, (mod 3713) ...
--
. 1404 2932 3536 0001 3284 2280 2235
. 0120 2001 0311 0001 2000 0401 2314
---

DECODE
--
.  0120200103110001200004012314
.   A T T A C K   A T   D A W N	
---

-----
          RSA algorithms

ENCRYPT, DECRYPT
  Use successive squaring (see Lecture 20)
    THM: Less than 2 lg N multiplications
    are required to compute x^N mod M

FIND RANDOM 100-DIGIT PRIME
  Hard way: generate random 100-digit number,
    try to factor it.
  Fast way: probabilistic method    

FIND p GIVEN s, x, and y:
  Extended Euclidean algorithm


RSA applicability rests on fast multiplication (!)
  present: use hybrid method (ex: PGP)
  future: unnoticed special-purpose chips

Bottom line:
  practical algorithm
  widely applicable
  depends on fundamental research results
    in theoretical CS and number theory

(RSA inc. went public for $250 million)          COS 226 Lecture 23: Dynamic Programming


DYNAMIC PROGRAMMING
* old term from operations research
* recursion revisited

Modern definition of dynamic programming:
  "Bottom-up implementation of recursive 
  programs with overlapping subproblems"

  Top-down implementation sometimes easier

Typically improves running time of an algorithm
  from exponential time to polynomial time (!)

Many applications

-----
          Fibonacci numbers

F(i) = F(i-1) + F(i-2)
--
  0      0       10     55       20   6765
  1      1       11     89       21  10946
  2      1       12    144       22  17711
  3      2       13    233       23  28657
  4      3       14    377       24  46368
  5      5       15    610       25  75025
  6      8       16    987       26 121393
  7     13       17   1597       27 196418
  8     21       18   2584       28 317811
  9     34       19   4181       29 514229
---
Numbers grow exponentially
  F(i)/F(i-1) approaches 1.618... (golden ratio)
  F(45) more than one billion

Recursive program to compute F(i)
--
int F(int i)
  { 
    if (i < 1) return 0;
    if (i == 1) return 1;
    return F(i-1) + F(i-2);
  }
---
Problem: program is slow!
Easy proof:
  running time satistfies the recurrence
    F(i) = F(i-1) + F(i-2)  
-----
          Overlapping subproblems

Fibonacci computation is slow because subproblems
  in recursive computation overlap

To compute F(4)
    compute F(3)
      compute F(2)
        compute F(1) = 1
        compute F(0) = 0
        return 1
      compute F(1) = 1
      return 2
    compute F(2)
      compute F(1) = 1
      compute F(0) = 0
      return 1
    return 3
  return 5    

Fix by remembering solutions already computed
-----
          Avoid recomputing known results

Maintain array (indexed by parameter value) 
* zero if recursive routine not yet called 
  for that value yet
* result to be returned otherwise

First call on procedure for a given value: 
  compute as before, save value
Second and subsequent calls: 
  use value from first call

Ex: Fibonacci numbers less than 10^9

--
for (i = 0; i < 45; i++) known[i] = 0;
int F(int i)
  { int t;
    if (known[i]) return known[i];
    if (i == 0) t = 0;
    if (i == 1) t = 1;
    if (i > 1) t = F(i-1) + F(i-2);
    known[i] = t; 
    return t;
  }
---

LINEAR running time
LOGARITHMIC amount of extra space

-----
          Recursive call structures

Straight recursive algorithm

Remembering known results

To compute F(5)
  compute F(4)
    compute F(3)
      compute F(2)
        compute F(1) = 1
        compute F(0) = 0
        return 1
      use known F(1) = 1
      return 2
    use known F(2) = 1
    return 3
  use known F(3) = 2
  return 5    

-----
          Bottom-up approach

DYNAMIC PROGRAMMING
  * Tabulate answers to subproblems
  * Build table in increasing order of
    problem size
  * Use early entries to compute later ones
    (no recomputation needed)

Ex: Fibonacci numbers
--
    F[0] = 0; F[1] = 1;
    for (j = 2; j < i; j++)
      F[j] = F[j-1] + F[j-2];
---
"Dynamic programming solution to the
  problem of computing Fibonacci numbers"
Perhaps the simplest nontrivial example of
  dynamic programming

Reduces running time from exponential to linear
Similar savings are available for a wide variety
  of important practical problems

Applicable to *any* pure recursive algorithm with
* no effects on global variables
* enough space to save solutions to subproblems
No overlap: bottom-up divide-and-conquer
-----
          Knapsack problem

Given N types of food items, where each item
* takes a certain amount of space
* provides a certain amount of nutrients

KNAPSACK PROBLEM
Fill a knapsack of capacity M with items
such that total amount of nutrients is maximized

Ex:
--
             size     value
  apple         3        4
  banana        4        5
  chocolate     7       10
  donut         8       11
  egg           9       13
---

In a knapsack of capacity 17
  two apples + banana + chocolate gives 23
  donut + egg gives 24 (maximum)

Applications
  * shipping cargos
  * job scheduling 
  ...
-----
          Recursive solution of knapsack problem

For each item type, compute the value obtained
  if the last item added is of that type
Pick the maximum
  (also could return ID of last item added)

--
int knap(int capacity)
  { int i, j, max, t;
    for (i = 0, max = 0; i < N; i++)
      if ((j = capacity-size[i]) >= 0)
        if ((t = knap(j) + val[i]) > max) 
          max = t;
    return max;     
  }
--- 
DON'T USE THIS PROGRAM!
  overlapping subproblems
  excessive recomputation
  exponential time

Use dynamic programming (TD or BU) to avoid
  recomputation for overlapping subproblems
-----
          "Known answers" (TD) approach
  
To avoid recomputation in knapsack algorithm
  remember, for every knapsack capacity 
  * total cost for best way to fill knapsack
  * last item used in filling it

--
int knap(int C)
  { int i, j, max, maxj, t;
    if (costKnown[C]) return costKnown[C];
    for (i = 0, max = 0, j = 0; i < N; i++)
      if ((j = cap-size[i]) >= 0)
        if ((t = knap(j) + val[i]) > max) 
          { max = t; maxj = j; }
    costKnown[C] = max;
    bestKnown[C] = maxj;
    return max;     
  }
--- 
Use bestKnown to recover set of items

Maintain elegance of recursive formulation
  without sacrificing efficiency
-----
          Bottom-up approach to knapsack problem

Generalized version of Fibonacci numbers program
Use previous solutions to compute next one

--
for (i = 0; i < C; i++) cost[i] = 0;
for (i = 0; i < C; i++)
  for (j = 1; j <= N; j++)
    if (i >= size[j])
      if (cost[i] < cost[i-size[j]]+val[j])
        {
          cost[i] = cost[i-size[j]]+val[j];
          best[i] = j;
        }
---

--
  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17
  4  5  5  5 10 11 13 14 15    
  a  b  b  b  c  d  e  a  a
---

Total time proportional to N*C

Another approach
  N passes through the cost array
  [just interchange "for" loops]

-----
          Dynamic programming summary

Many problem solutions are elegantly
  described as recursive programs

If subproblems overlap, programs may 
  require exponential time

TOP-DOWN (known answers) approach
  Mechanically modify recursive routine to
  * test if called yet for the given parameter value
  * if so, return previously computed value
  * if not, compute value and save it

BOTTOM-UP (standard DP) approach
  * Tabulate answers to subproblems
  * Build table in order of problem size
  * Use early entries to compute later ones

Further improvements often possible
  * do computations in particular order
  * use specialized knowledge 
      to eliminate more computations

-----
          Dynamic programming with two variables

OPTIMAL BST problem
Given access frequencies, minimize average
  search cost (weighted path length) in BST

Ex:
  weighted path length 
    1*1 + 2*(4+2) + 3*(2+3+1) + 4*5 = 51

  weighted path length 
    1*3 + 2*(4+5) + 3*(2+2) + 4(1+1) = 41

-----
          Top-down solution to optimal BST problem

Recursive program with two arguments
  best subtree for ith through jth keys

--
int BST(int i, int j)
  { int k, min, t;
    for (k = i, min = 0; k <= j; k++)
      if (t = BST(i, k-1) + BST(k+1, j) < min) 
         min = t;
    return min;     
  }
--- 

DON'T USE THIS PROGRAM
  overlapping subproblems, exponential time


Known values (top-down) approach
* precisely the same as for one-variable
* eliminate recomputation for overlapping 
    subproblems by saving and recalling 
    known values in 2D array

-----
          Bottom-up solution to optimal BST problem

Two-dimensional array
* diagonal has subproblems of size 1
* one up from diagonal has subproblems of size 2
* two up from diagonal has subproblems of size 3
  ...

--
for (i = 1; i <= N; i++) 
  for (j = i+1; j <= N+1; j++) cost[i][j] = MAX;
for (i = 1; i <= N; i++) cost[i][i] = f[i];
for (i = 1; i <= N+1; i++) cost[i][i-1] = 0;

for (j = 1; j <= N-1; j++)
  for (i = 1; i <= N-j; i++)
    {
      for (k = i; k <= i+j; k++)
        {
          t = cost[i][k-1]+cost[k+1][i+j];
          if (t < cost[i][i+j])
            { cost[i][i+j] = t; 
              best[i][i+j] = k; }
        }
      for (k = i; k <= i+j; )
        cost[i][i+j] += f[k++];
    }
---

-----
          Optimal BST example

"cost" array
--
.  4  8 11 19 31 37 41
.     2  4 10 20 25 28
.        1  5 14 18 21
.           3 11 15 18
.              5  9 12
.                 2  4
.                    1
---

"best" array
--
.     A  A  B  D  D  D
.        B  D  D  E  E
.           D  E  E  E
.              E  E  E
.                 E  E
.                    F
---
-----
          Optimal recursive decomposition problems

Trees model many other computation problems

"Optimal divide-and-conquer"

MATRIX CHAIN PRODUCT problem
  Given a list of matrices, find optimal
  prenthiesization for pairwise multiplication

TRIANGULATION
  Divide a convex polygon into triangles
  using minimal lines of total length

Correspondence between trees and
  * parenthesizations
  * triangulated polygons
  ...

-----
          String processing DP example

LONGEST COMMON SUBSEQUENCE problem  
Find the longest common subsequence in 2 strings
Applications: molecular biology, text editors, ....

subsequence: subset of string chars, in order
  2^N different subseqs in string of length N
  
Ex: LCS in ABCBDAB and BDCAGA is BCBA

Recursive program is a description of a
  recursively defined solution

--
int LCSlen(int i, int j)
  { 
    if ((i == -1) || (j == -1)) return 0;
    if (a[i] == b[j]) return 1+LCSlen(i-1,j-1);
    t = LCSlen(i, j-1); u = LCSlen(i-1, j);
    if (t > u) return t; else return u;
  }
--- 

As usual, DON'T USE THIS PROGRAM
It takes exponential time as is, but we can
  * change to use known values OR
  * compute values bottom-up

-----
          "Far too many" subproblems

Dynamic programming cannot be used unless space
  is available to hold answers to subproblems
Three important cases illustrated by different
  versions of knapsack problem
EXAMPLE 1: 
Sizes are reals, not integers
--
                   size     value
      apple         30000    .0004
      banana        .0001    12345
      chocolate     .0002    21244
      donut         .1234    .2125
      egg           .0012      123
---  
  Table size depends on precision, relative sizes
  typically much more precision than space
EXAMPLE 2: 
Allow at most one item of each type in knapsack
  Different subproblem for each SUBSET
      of item types
  2^N subsets: EXPONENTIAL space
EXAMPLE 3:
Bananas and eggs can't go on the bottom, etc.
  Different subproblem for each PERMUTATION 
      of item types
  N! permutations: superEXPONENTIAL space
    ( N! grows like 2^(N lgN) )
-----
          NP-complete problems

For a large class of important problems
  no algorithms are known 
  that are guaranteed to be fast

EFFICIENT, TRACTABLE
  running time bounded by polynomial
  on input size, no matter what the input
INEFFICIENT, INTRACTABLE
  running time exponential in input size
  on some input

Dynamic programming no help because
  exponential SPACE required for DP solution

Can't guarantee solution to intractable problems
  even with a supercomputer!!
* tough to explain to the boss...
* source of frustration for early programmers
* theory provides some consolation

Includes a very large class of natural problems
  (all equivalent wrt difficulty)
Fundamental issue in the theory of computation

-----
          Examples of NP-complete problems

* TRAVELING SALESMAN
  A salesman needs to visit a set of N cities.
  Find a route that minimizes travel distance.

* SCHEDULING
  A set of jobs of varying length need to be 
  done on two identical machines before a 
  certain deadline. Can the jobs be arranged 
  so that the deadline is met?

* SEQUENCING
  A set of four-character fragments have 
  been obtained by breaking up a long string 
  into overlapping pieces. Can the fragments 
  be reconstituted into the long string?

* TRIPARTITE MATCHING
  Given three sets of N individuals, and a set of
  triples with one from each set, is there
  a subset of the triples containing each
  individual just once?

* SATISFIABILITY
  Is there a way to assign truth values to 
  a given logical formula that makes it true?