Data Compression Study Guide

Terminology and Basics

Why compress? To save space and time.
How does compression work? Compression takes advantage of structure within data.
What sort of data does not compress well? Random data.
Lossy compression can further reduce file sizes by throwing away unimportant information.
What is the compression ratio?
Why can no compression algorithm possibly compress all bit streams?
What fraction of bitstreams can be compressed in half by a general-purpose algorithm?

Run-length coding

Huffman coding

Basic idea: Variable length codewords to represent fixed length characters. More common characters are represented by shorter codewords.
What is a prefix-free code? Why is it important that Huffman coding use a prefix-free code? Would encoding work with a non prefix-free code? Would decoding work?
Why is it necessary to transmit the coding trie? Why don't we have to do something similar with run length encoding or LZW?
Why do we typically use an array for encoding and a trie for decoding?
You do not need to know the specifics of the binary representaiton of the Huffman trie. However, you should conceptually understand the idea of transmitting/reading the trie using an preorder traversal.

LZW

Basic idea: Fixed length codewords to represent variable length strings. More common characters are represented by shorter codewords.
Why do we typically use a trie for encoding and an array for decoding?
How do you handle the 'strange' case (where a codeword is seemingly not in the table during decoding)?