COS 597D, Fall 2013 - Assignment 10
Due 11:55pm Friday December 13, 2013
Collaboration Policy
You may discuss problems with other students in the class. However,
each student must write up his or her own solution to each problem
independently. That is, while you may formulate the solutions to
problems in collaboration with classmates, you must be able to
articulate the solutions on your own.
Late Penalty
40% of the earned score if
submitted after the due date.
Problem:
Cognetic
Systems
Inc. is a company that sells an XML database product called
XQuantum, which uses XQuery. They have an XQuery demo
site that offers XML versions of Shakespeare plays (all 37).
We will use this site to try some queries in XQuery.
The assignment consists of the following steps. Note that you do not need to submit
anything until Step 4:
- Go to Cognetic System's XQuery demo
site. From the menu at the left, choose
"Full-Text Search." You will reach a Web page with a drop-down
choice box labeled "Full-text Queries." We are not using these
sample queries in this assignment, but you may want to look at them for
more XQuery examples. We will be entering our own queries
in the "Query:" text box. The Shakespeare
Plays database is in document plays.xml,
which
is the root of the XML tree. The
document must be specified at
the beginning of each path expression using doc("plays.xml"), e.g. doc("plays.xml")/PLAYS/PLAY.
- Type (or cut and paste) the following XML query into the text box labeled "Query:".
Note that this version of XQuery is case
sensitive.
for
$play in doc("plays.xml")/PLAYS/PLAY
where $play/TITLE ftcontains 'Hamlet'
return $play
Then click the Submit button. You will see in the
result window of the returned page the full XML text of Hamlet. This gives you an
idea of how the plays are tagged. Note that
"ftcontains" denotes "full text
contains" and matches full words. As you can see from the
text of the TITLE tag, Hamlet
is not the complete name of the play. When you are done
examining the results, click on "return to query" at the top (and
bottom) of the result page. This should return you to the
query entry page.
- Type (or cut and paste) the following XML query into the text box and
submit it:
for $speech in doc("plays.xml")/PLAYS/PLAY//SPEECH
where $speech/LINE ftcontains 'ghost'
return $speech
- Modify the query of Step 3 to return only the lines containing
'ghost'. Turn in the query and
the result of the query as part of the submission of this assignment.
You
may print the result page showing the query and result. If
you wish to submit electronically, you can either cut and paste
the query and result into a text file or you can print the result page
to a pdf file.
- Type (or cut and paste) the following XML query into the text box and send
it:
for $ghostspeech in
doc("plays.xml")/PLAYS/PLAY
[TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
{$ghostlines}
</GhostLines>
Describe the query in English. Why is "let $ghostlines :=
$ghostspeech/LINE"
used?
Submit
your answers to these questions; do not submit the result of the
query.
- Type (or cut and paste) the following XML query into the text box and send
it:
for $ghostspeech in
doc("plays.xml")/PLAYS/PLAY
[TITLE ftcontains 'Hamlet']//SPEECH [SPEAKER = 'Ghost']
let $ghostlines := $ghostspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>
</GhostLines>
Describe the query in English. Why is
"$ghostspeech/LINE[1]/text()" used in the
return for this query? Submit your answers to
these questions; do not
submit the result of the query.
Note that TAGNAME[n] is a general form for
referring to the nth occurrence of a child element
with tag TAGNAME in the order the children appear in the document.
- Type (or cut and paste) the following XML query into the text box and send
it:
let $allgspeech := doc("plays.xml")/PLAYS/PLAY [TITLE ftcontains 'Hamlet']//SPEECH
[SPEAKER = 'Ghost']
let $ghostlines := $allgspeech/LINE
return
<GhostLines
total_number="{count($ghostlines)}">
{for
$ghostspeech in $allgspeech
return
<FIRSTLINE>{$ghostspeech/LINE[1]/text()}</FIRSTLINE>}
</GhostLines>
How do the results
of this query differ from those in Step 7? Why is "$allgspeech" used? Why is a
nested query used in the return portion? Submit your answers to
these questions; do not
submit the result of the query.
- Write an XQuery query that returns the speakers
of the speeches in Romeo and Juliet that
contain
the word "hate" in one or more of the lines of the
speech. Turn in the query and
the result of the query as part of the submission of this assignment.
- Write an XQuery query that returns for each speech in Romeo and Juliet that contain the
word "hate" both the speaker of the speech and the lines of the speach
that contain "hate". Turn in the query and
the result of the query as part of the submission of this
assignment. The
results should look like:
<SPEAKSHATE>
<SPEAKER>TYBALT</SPEAKER>
<LINE>What, drawn, and talk of peace! I hate the word,</LINE>
<LINE>As I hate hell, all Montagues, and thee:</LINE>
</SPEAKSHATE>
<SPEAKSHATE>
<SPEAKER>PRINCE</SPEAKER>
<LINE>Canker'd with peace, to part your canker'd hate:</LINE>
</SPEAKSHATE>
...