COS 597D, Fall 2013
Questions on Incremental Distributed Processing (Peng and Dabek)
Due at 1:30pm, Wednesday November 20, 2013.
You may hand in a paper copy or email a file to me.
Keep a copy for your use during class discussion.
No credit for
late submission.
For our Nov. 20 discussion,
we consider the paper by Peng and Dabek presenting Google's strategy of
incrementally updating its search index instead of using versions of
the index batch-produced using MapReduce. The questions below ask about
the main ideas, and your
answers should be brief. As usual, we may wish to dig deeper in
class discussion.
1. What are the main techniques (at least 2) used by Percolator
to achieve efficient incremental processing?
2. What guarantees does snapshot isolation provide? What
doesn't it provide?
3. What are the gains in using Percolator over MapReduce?
4. Discuss some of the vulnerabilities or high-cost aspects of
the Percolator system.