COS 126 Closest Pairs |
Due: Wed. 4/9, 11:59pm |
Suppose that N random points have been thrown onto a chessboard. In this assignment, you'll write a program to find and plot all pairs of points that are closer to one another than to any other on a two-dimensional space, like a chessboard. You'll be using arrays of linked lists to make it possible to handle large sets of points. This problem is representative of "clustering" problems that arise in database and graphics problems. The following 3-part sequence will lead to the final solution in manageable steps.
Write gridcount.c
, a program that prints how many points fell
into each square. The input is (x,y) coordinates in the unit
square (that is, x and y are both between 0 and 1), one pair
per line. You can use /u/cs126/bin/randpoints
to check the input
format and to generate test data for small cases. (The source code for randpoints
is in /u/cs126/examples/randpoints.c
).
With no command-line arguments, randpoints
generates 10 random
points:
% /u/cs126/bin/randpoints 0.242578 0.013470 0.383139 0.414653 0.067769 0.993127 0.484308 0.765338 0.031834 0.030935 0.932640 0.887880 0.591330 0.478779 0.833354 0.186335 0.735653 0.115053 0.698659 0.355604
You can specify the number of points with a command-line argument, e.g.,
/u/cs126/bin/randpoints 100
emits 100 points. You can pipe the
output of randpoint directly into your program, e.g.,
% /u/cs126/bin/randpoints 100 | a.out 2 1 0 1 0 2 2 3 2 0 2 3 1 1 3 1 1 1 2 2 0 0 0 1 4 2 2 4 1 0 3 2 1 0 0 2 1 1 1 3 6 0 2 0 1 1 1 2 2 4 5 2 1 1 2 2 2 1 1 1 1 2 0 2
Your program should represent a G×G chessboard with
a two-dimensional array, int grid[G][G]
, where grid[i][j]
is the number of points that land in the square at row i
, column
j
. For example, with G = 10, the input line 0.123
0.456
causes grid[1][4]
to be incremented. The output above
shows the result for G = 8. Here's a outline of gridcount.c
:
#include <stdio.h> #define G 8 int grid[G][G]; void gridinit(void) { /* initialize grid */ } void gridinsert(float x, float y) { /* increment grid[i][j] */ } void gridcount(void) { /* print the grid */ } int main(void) { float x, y; gridinit(); while (scanf("%f %f\n", &x, &y) != EOF) { gridinsert(x, y); } gridcount(); return 0; }
Copy this code as a starting point for your gridcount.c
.
Implement the bodies of the functions gridinit
, gridinsert
,
and gridcount
; each body is only a few lines of code. Your goal is
to be able to handle the data file /u/cs126/data/stars
, which is a
map of about 5000 stars in the sky projected onto the unit square. Here's the
output with G = 8:
% a.out </u/cs126/data/stars 1 12 50 86 122 166 23 0 4 45 76 80 130 143 100 9 29 64 74 92 117 114 72 43 65 65 87 94 137 122 73 46 52 79 85 111 149 110 65 55 45 72 82 149 166 96 54 39 16 62 107 162 111 67 67 10 0 14 54 102 79 50 7 0
Do not submit gridcount.c
. The purpose of this warm up is to
familiarize yourelf with the problem and with the grid
data
structure to prepare for solving the more difficult problem described below. Run
your program for 100 random points, 1000 random points and the star data, and
try changing G. Note that G
is built into the program;
to change it, you have to change the #define
line and recompile
your program.
The first refinement to the program above is to replace the counts by linked
lists of points. That is, in gridlistcount.c
, grid[i][j]
points to a linked list of the points that land in the i
,j
square. The number of points in each square can be computed by counting the
number of nodes in each linked list. Here's an outline of gridlistcount.c
;
bold type identifies those lines that are different from the corresponding lines
in gridcount.c
.
#include <stdio.h> #include "misc.h" #define G 8 struct node { float x, y; struct node *next; } *grid[G][G]; void gridinit(void) { /* initialize grid */ } void gridinsert(float x, float y) { /* add a node for x,y to grid[i][j] */ } void gridcount(void) { /* print the grid */ } int main(void) { float x, y; gridinit(); while (scanf("%f %f\n", &x, &y) != EOF) { gridinsert(x, y); } gridcount(); return 0; }
The linked lists are built from node
structures; the next
field points to the next node on a list. As each point is read, gridinsert
creates a node for the point and adds it to appropriate linked list. gridcount
traverses grid
to count the number of points in each square,
printing out the results in a
G ×G array as shown above. The output of gridlistcount
should be identical to the output of gridcount
.
You'll need to allocate struct node
s for each point in gridinsert
.
Use emalloc
described in
lectures 14 and 15.
Include misc.h
as shown in
the outline above, and compile your program with the command
% lcc -n -I/u/cs126/include gridlistcount.c /u/cs126/lib/libmisc.a
/u/cs126/lib/libmisc.a
is the library that holds emalloc
.
See the
lcc
man page for information about the -n
option; you can
appreciate the value of this option only by attempting to debug without it.
The third and last refinement is the program gridclose.c
,
which finds the closest pairs and plots the points. That is, for each poin P,
find the point that is closest to P and draw a line between the two
points. As in gridlistcount.c
, build a linked list of all the
points that fall into each square in a G×G grid. Then,
for each point, look in the lists associated with the neighboring squares for
the closed point. For large point sets, this approach makes the difference
between being able to solve the problem and not being able to afford to wait for
the answer. Here's a outline of gridclose.c
; as above, bold type
identifies those lines that are different from the corresponding lines in
gridlistcount.c
.
#include <stdio.h> #include "misc.h" #include "print.h" #define G 20 struct point { float x, y; }; struct node { struct point pt; struct node *closest; struct node *next; } *grid[G][G]; void gridinit(void) { /* initialize grid */ } void gridinsert(float x, float y) { /* add a node for x,y to grid[i][j] */ } void gridclosest(struct node *p) { /* find the node closest to p */ } int main(void) { float x, y; printprolog(G); gridinit(); while (scanf("%f %f\n", &x, &y) != EOF) { printpoint(x, y); gridinsert(x, y); } /* for every node t, set t->closest by calling gridclosest(t) */ /* for every node t, call printpair(&t->pt, &u->pt) if t->closest = u and u->closest = t */ printepilog(); return 0; }
You may copy this code as a starting point for your gridclose.c
.
Notice that the x,y coordinates are now packaged in point
structures, node
structures have a point field named pt, and node
structures have new field, closest
, which holds a pointer to
another node.You'll be able to use your gridinit
and gridinsert
functions from gridlistcount.c
with only a few modications.
The new function, gridclosest
, sets the closest
field in a node p
to point to the node that holds the point
closest to p
. As suggested in the pseudocode comments above, you
need to set the closest
fields in main
. Once all of
the closest fields are set, make a pass through the nodes and call printpair(t,
t->closest)
for every node t
for which t == t->closest->closest
.
You may ignore the possibility that a point is so isolated that there are no
points in any nearby grid squares (for real data, one would presumably find that
out by using gridcount
).
The various print
functions emit PostScript to draw points
(printpoint) and lines between points (printpair). The source code for these
functions is in /u/cs126/examples/print.h
and /u/cs126/examples/print.c
.
You'll need to compile your program with the command line
% lcc -n -I/u/cs126/examples -I/u/cs126/include gridclose.c /u/cs126/examples/print.c /u/cs126/lib/libmisc.a
You can make this a bit less painful by using /u/cs126/bin/lcc
instead of the default lcc
; /u/cs126/bin/lcc
automatically looks in /u/cs126/examples
,
/u/cs126/include
and /u/cs126/lib
for include files, source files, object files,
and libraries, and it automatically searches /u/cs126/lib/libmisc.a
when it builds an a.out
. Thus, you can compile your program and
/u/cs126/examples/print.c
with the command
% /u/cs126/bin/lcc -n gridclose.c print.c
If you add /u/cs126/bin
to your path, you can omit /u/cs126/bin/
in the command above.
Try your program on 100 random points and on the star data. Redirect the
output of your program in a file and use
gs
,
the Ghostscript previewer, to view the generated Postscript. Use commands like
% /u/cs126/bin/randpoints 100 | a.out >100.ps % a.out </u/cs126/data/stars >stars.ps
Follow the links above to see the expected output. 100.ps
should be 174 lines and stars.ps
should
be 6062 lines! Of course, you should test your program first on very small point
sets.
Turn in your code and your documentation with the command
/u/cs126/bin/submit 8 readme gridlistcount.c gridclose.c
Do not turn in gridcount.c
.
Make your program work in the presence of isolated points.