COS 597D, Fall 2013
Questions on data distribution in noSQL papers
Due at 1:30pm, Wednesday October 23, 2013.
You may hand in a paper copy or email a file to me.
Keep a copy for your use during class discussion.
No credit for
late
submission.
The sections of papers
assigned for Oct. 23 discuss the distributed storage and access of
data in Bigtable and Cassandra. Sections of the paper
describing the Google distributed file system (GFS) are included
because Bigtable relies on that system to take care of some of the
issues of distributed storage. You should consider GFS to be
part of Bigtable for the purpose of answering the questions
below. The questions below ask about the main ideas, and
your answers should be brief. We may wish to dig deeper in
class
discussion.
1. There are many ways one might organize the distributed
storage of data structured as rows and columns. What design
decisions are shared by both Bigtable and Cassandra?
2. Bigtable uses a "master server" and "tablet servers"
to manage the reading and writing of data; Cassandra does not
have a distinguished master node. What are the pros and cons
of each architecture?
3. How is replication handled in each of Bigtable and
Cassandra?
4. What are the main steps in reading and writing data in
Bigtable?
5. What are the main steps in reading and writing data in Cassandra?