DATA COMPRESSION STUDY GUIDE
Terminology and Basics
- Why compress? To save space and time.
- How does compression work? Compression takes advantage of structure within data.
- What sort of data does not compress well? Random data.
- Lossy compression can further reduce file sizes by throwing away unimportant information.
- What is the compression ratio?
- Why can no compression algorithm possibly compress all bit streams?
- What fraction of bitstreams can be compressed in half by a general-purpose algorithm?
Run-length coding
- Takes advantage of repeated binary digits.
- How do you handle runs of length more than 2M?
Huffman coding
- Basic idea: Variable length codewords to represent fixed length characters.
More common characters are represented by shorter codewords.
- What is a prefix-free code?
Why is it important that Huffman coding use a prefix-free code?
Would encoding work with a non prefix-free code? Would decoding work?
- Why is it necessary to transmit the coding trie?
Why don't we have to do something similar with run length encoding or LZW?
- Why do we typically use an array for encoding and a trie for decoding?
- You do not need to know the specifics of the binary representaiton of the Huffman trie.
However, you should conceptually understand the idea of transmitting/reading
the trie using an preorder traversal.
LZW
- Basic idea: Fixed length codewords to represent variable length strings.
More common characters are represented by shorter codewords.
- Why do we typically use a trie for encoding and an array for decoding?
- How do you handle the 'strange' case
(where a codeword is seemingly not in the table during decoding)?
Recommended Problems
C level
- Fall 2011 Final, #10b (LZW)
- Fall 2012 Final, #12 (Huffman)
- Spring 2012 Final, #10 (BW)
- Textbook 5.5.3
B level
- Fall 2011 Final, #10a (Huffman)
- Textbook 5.5.13
- Textbook 5.5.17
A level
- Fall 2012 Final, #13