9. Hamming Codes in Toy
Download Project Zip
| Submit to TigerFile
Goals
- To learn about error-correcting codes.
- To learn to program in a machine language.
- To appreciate the programming and debugging advantages of a structured programming language like Java over machine language.
Getting Started
-
Download the project
zip
file for this assignment from TigerFile , which contains the files you will need for this assignment, which contains:- two Java programs (
HammingEncoder.java
andHammingDecoder.java
) that illustrate the encoding and decoding procedures; - sample input and output files;
- TOY reference card;
- the
readme.txt
template; - the
acknowledgments.txt
template; multiply.toy
TOY program;- Visual X-TOY simulator;
- TOY simulator
TOY.java
- two Java programs (
-
This is a partner assignment. Instructions for help finding a partner and creating a TigerFile group can be found on Ed.
-
The rules for partnering:
- Choose a partner whose skill level is close to your own - only two partners per group.
- Your partner does not have to be in your precept.
- The rules for partnering are specified on the course syllabus. Make sure you read and understand these rules, and please post on Ed if you have questions. In your
acknowledgments.txt
file, you must indicate that you adhered to the COS 126 partnering rules.
Background
Error-correcting codes enable data to be sent through a noisy communication channel without corruption. To accomplish this, the sender appends redundant information to the message, so that even if some of the original data is corrupted during transmission, the receiver can still recover the original message intact. Transmission errors are common and can arise from scratches on a CD, static on a cell phone, or atmospheric interference. In a noisy environment, error-correcting codes can increase the throughput of a communication link since there is no need to retransmit the message if it becomes partially corrupted during transmission. For this reason, error-correcting codes are used in many common systems, such as storage devices (CD, DVD, DRAM), mobile communication (cell phones, wireless, microwave links), and digital television.
A Hamming code is a specific type of error-correcting code that allows the detection and correction of single-bit transmission errors. Hamming codes are used in many applications where such errors are common, including DRAM memory chips and satellite communication hardware. Hamming codes work by repeatedly reading four message bits, which we denote by \( m_1, m_2, m_3, m_4 \), and then inserting three parity bits, which we denote by \(p_1, p_2, p_3\). If any one of these seven bits is corrupted during transmission, the receiver can detect the error and recover the original four message bits intact. This is called single-bit error correction because at most one bit can be corrected per unit of data sent. The overhead for using this method is a 75% increase in bandwidth because it requires three extra parity bits for every four message bits. An alternate approach is sending three copies of each bit (and using the one that appears most frequently), which results in a 200% increase in bandwidth.
Approach
Before we describe the algebra of Hamming codes, we first visualize the calculation of the parity bits using Venn diagrams. As an example, suppose we wish to send the 4-bit message 1101
. We associate each of the four message bits with a specific intersection region of three pairwise overlapping circles, as illustrated below:
The Hamming code adds three parity bits so that each of the three circles has even parity:
That is, the sum of the four bits contained in each of the three circles is even:
For this example, the three parity bits are 1
(top), 0
(left), and 0
(right). So, to send a version of the message 1101
that is robust against single-bit errors, the actual message we send is the 7-bit message 1101100
.
Now, imagine this picture is transmitted over a noisy communication channel, and that one bit is corrupted so that the following picture arrives at the receiving station (corresponding to 1001100
):
The receiver discovers that an error occurred by checking the parity of the three circles. Moreover, the receiver can identify where the error occurred (the second bit) and recover the four original message bits!
The parity check for the top circle and right circle failed, but the left circle was ok. So, the only 7-bit message that balances all three parity checks and has at most one changed bit is the one obtained by flipping \(m_2\), since \(m_2\) represents the intersection of exactly those two circles. Thus the receiver knows that the message originally sent was 1101100
—representing the four message bits 1101
plus the three parity check bits 100
. This achieves the goal, since the original message bits sent were indeed 1101
.
If the center bit \(m_4\) is corrupted, then all three parity checks will fail. If a parity bit itself is corrupted, then only one parity check will fail. If the communication channel is so noisy that two or more bits are simultaneously corrupted, then the scheme will not work. Can you see why? More sophisticated types of error-correcting codes can handle such situations.
Of course, in practice, only the seven bits are transmitted, rather than the Venn diagrams.
Requirements
- Write two programs
encode.toy
anddecode.toy
. - Each TOY instruction must have corresponding pseudocode. (This is auto-generated by Visual X-Toy – see below.)
- It is good practice to add line breaks between logically related sections of TOY code.
- Write a comment above each section explaining what that code does.
- The third column is used for your own comments. Write comments as you write your code. This will help you understand your design as well as help during office hours. Hint: use the Java code from
HammingEncoder.java
andHammingDecoder.java
as comments.
TOY Program: encode.toy
Write a TOY program encode.toy
to encode a binary message using the scheme described above. Repeatedly read four (4) bits \( m_1, m_2, m_3, m_4 \) from TOY standard input and write the seven (7) bits \( m_1, m_2, m_3, m_4, p_1, p_2, p_3 \) to TOY standard output. Stop upon reading FFFF
from standard input and then output FFFF
to standard output.
- \(p_1 = m_1 \wedge m_2 \wedge m_4\)
- \(p_2 = m_1 \wedge m_3 \wedge m_4\)
- \(p_3 = m_2 \wedge m_3 \wedge m_4\)
Recall that ^ is the exclusive or operator in Java and TOY. This captures the parity concept described above.
TOY Program: decode.toy
Write a TOY program decode.toy
to decode and correct a Hamming encoded message. Repeatedly read seven (7) bits \( m_1, m_2, m_3, m_4, p_1, p_2, p_3 \) from TOY standard input and write the four (4) correct bits \( m_1, m_2, m_3, m_4 \) to TOY standard output. Stop upon reading FFFF
from standard input and then output FFFF
to standard output. Recall, to determine which one, if any, of the message bits is corrupted, perform the parity checks:
- \(p_1 = m_1 \wedge m_2 \wedge m_4\)
- \(p_2 = m_1 \wedge m_3 \wedge m_4\)
- \(p_3 = m_2 \wedge m_3 \wedge m_4\)
Compare the parity bits you computed with the parity bits you received. If they do not match, then some bit was flipped. Here’s what to do with the results:
- If exactly zero or one of the parity checks fail, then all four message bits are correct.
- If checks 1 and 2 fail (but not check 3), then bit \(m_1\) is incorrect.
- If checks 1 and 3 fail (but not check 2), then bit \(m_2\) is incorrect.
- If checks 2 and 3 fail (but not check 1), then bit \(m_3\) is incorrect.
- If all three checks fail, then bit \(m_4\) is incorrect.
Flip the corrupted message bit (if necessary) and write (only) \( m_1, m_2, m_3, m_4 \) to standard output.
Input format
The input format for encode.toy
is a text file that contains the sequence of bits to be transmitted. Each line consists of a sequence of four (4) bits, with each bit specified as a 4-digit hexadecimal integer (either 0000
or 0001
), separated by whitespace. The last line consists of the single integer FFFF
, representing the end of the file. If there are no bits to be transmitted, the file contains only the single integer FFFF
.
The file encode3.txt
contains three lines of 4-hex sequences, each representing a 4-bit sequence:
more encode3.txt
0001 0001 0000 0001
0001 0001 0001 0000
0001 0001 0001 0001
FFFF
The 4-hex sequence 0001 0001 0000 0001
represents the 4-bit sequence 1101
.
The 4-hex sequence 0001 0001 0001 0000
represents the 4-bit sequence 1110
.
The 4-hex sequence 0001 0001 0001 0001
represents the 4-bit sequence 1111
.
The input format for decode.toy
is similar, except that each line consists of a sequence of seven (7) hex digits, each representing a 7-bit sequence. For example:
more decode4.txt
0001 0001 0000 0001 0001 0000 0000
0001 0000 0000 0001 0001 0000 0000
0001 0001 0000 0000 0001 0000 0000
0001 0001 0000 0001 0001 0000 0001
FFFF
You can assume the input files are in the prescribed format. The input files will contain only a sequence of 0000
’s and 0001
’s, followed by FFFF
. If there are no bits to be decoded, the file contains only the single integer FFFF
.
In summary, input files for encode.toy
will have a multiple of four 0000
’s and 0001
’s; input files for decode.toy
will have a multiple of seven 0000
’s and 0001
’s; and the last line consists of the single integer FFFF
, representing the end of the file.
Using the Visual X-TOY Simulator
Use the Visual X-TOY simulator by executing the following command in the IntelliJ terminal:
java -jar toy.jar
The Visual X-TOY simulator provides lots of useful development features. There are two major modes: edit and debug. Use Mode > Edit Mode to edit .toy
programs and Mode > Debug Mode to debug or run .toy
programs. In Debug Mode, use the Reference, Stdin, Stdout and Core tabs on the right to help debug your program.
To open a TOY program from a file, use Select File > Open, then browse to find the program, e.g., multiply.toy
.
To edit a TOY program, use Select Mode > Edit Mode. You can type your program directly in the edit window. Use multiply.toy
as a reference for formatting and commenting.
To redirect standard input from a file, in Edit Mode, select Mode > Load File to Stdin.
To execute a TOY program, use Select Mode > Debug Mode. Then click the Run button at the bottom. You can enter data on standard input by typing the value in the Stdin tab and clicking the Add button. You will have to click the Run button to resume execution. You can see the results on standard output in the Stdout tab.
Common TOY programming issues
- The program counter starts at
10
so make sure that you place your code starting at memory address10
. - Be sure that each line follows the required format
XX: XXXX
. - Check for duplicated line numbers or out-of-order line numbers.
- Check for using a letter like
O
orI
instead of the number0
or1
. - Remember that all values, memory addresses, and arithmetic are in hex. For example, memory address
1A
follows19
. If you don’t specify the value for a memory address, that memory address will have the default value of0000
, which is aHALT
instruction. - Watch out for jump statements — if you insert a new line of code between existing lines, you will have to renumber all the following lines. Also, be sure to update any jump statements that have hardwired line numbers.
- Be very careful to update both your comments and your actual code at the same time! It is very common to update only the pseudocode and not the machine instruction.
- All registers are global variables. One way to be sure that a register is not used for two different purposes is to fill in the question in the
readme.txt
about registers before or during debugging.
Possible Progress Steps
These are purely suggestions for how you might make progress. You do not have to follow these steps. If you get stumped or frustrated on some portion of the assignment, you should not hesitate to consult a preceptor.
- Open the Visual X-Toy Simulator and experiment with some of the example programs.
- Try the sample programs Add, Echo, and Sum by going to File > Open Example Programs.
- Open and experiment with
multiply.toy
, which is included in the project folder. - Try executing some of the programs from precept.
- Read, compile, and run the reference solutions
HammingEncoder.java
andHammingDecoder.java
. Use these reference solutions as a design for your TOY programs. - The
readme.txt
provides tables to help your design/implementation/debugging ofencode.toy
anddecode.toy.
This will help you think about how you plan to use and/or re-use the Toy registers as you implement these Toy programs. Before you start coding, complete these tables, and update as needed. They will be very helpful for you as well as those helping you during office hours and/or Lab TA debugging sessions. - As you write your TOY programs, remember:
- The first column contains the TOY instruction in hex.
- The second column contains the TOY pseudo-code.
- The third column is used for your own comments. Write comments as you write your code. This will help you understand your design as well as help during office hours, grading, etc.
- In X-Toy, create a new file (File > New).
- Save the file (File > Save) as
encode.toy
in your Hamming assignment project folder. - Incrementally implement and test
encode.toy
, usingHammingEncoder.java
as a guide. Remember that as you revise your program, you will need to edit your TOY statement numbers accordingly.- Write TOY code that reads a number from standard input. If the number is negative, the program shall halt. (Note, the only valid negative input is
FFFF
.) Otherwise, it prints the number and inputs the next number. - Edit your TOY program from step (a) so that it now reads in sequences of four numbers, halting if the first number is negative. Otherwise, it writes the four numbers and inputs the next sequence of numbers.
- Edit your TOY program from step (b) so that it now computes the parity bits, and writes the parity bits.
- Write TOY code that reads a number from standard input. If the number is negative, the program shall halt. (Note, the only valid negative input is
- Incrementally implement and test
decode.toy
, usingHammingDecoder.java
as a guide.
Testing
We provide several input test files of different sizes for both encode.toy
and decode.toy
, as well as the answers for each. Note that for convenience, the answer files list each input line and just a shortened version of the corresponding outputs.
Encoding
input file | output file |
---|---|
encode0.txt |
encode0-answer.txt |
encode1.txt |
encode1-answer.txt |
encode3.txt |
encode3-answer.txt |
encode16.txt |
encode16-answer.txt |
Decoding
input file | output file |
---|---|
decode0.txt |
decode0-answer.txt |
decode1.txt |
decode1-answer.txt |
decode3.txt |
decode3-answer.txt |
decode4.txt |
decode4-answer.txt |
decode6.txt |
decode6-answer.txt |
decode16.txt |
decode16-answer.txt |
decode128.txt |
decode128-answer.txt |
To test your TOY program using the command-line TOY.java
simulator, type the commands below into the IntelliJ embedded terminal. You should see the following output - one bit (Toy hex word) per line:
java-introcs TOY encode.toy < encode3.txt
0001
0001
0000
0001
0001
0000
0000
0001
0001
0001
0000
0000
0000
0000
0001
0001
0001
0001
0001
0001
0001
FFFF
java-introcs TOY decode.toy < decode4.txt
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001
0001
0001
0000
0001
FFFF
You can then compare the output to the corresponding answer file.
The most complete way to test your TOY programs is to encode and decode all possible inputs. Use the sample inputs encode16.txt
and decode128.txt
to do this.
Complete readme.txt
Your comments in your Toy program should help you answer these questions.
encode.toy
registers
Describe what, if anything, you use each of the registers for in encode.toy
.
decode.toy
registers
Describe what, if anything, you use each of the registers for in decode.toy
.
Submission
Submit to TigerFile
: encode.toy
, decode.toy
, completed readme.txt
and acknowledgments.txt
files.
Grading
Files | Points |
---|---|
encode.toy | 16 |
decode.toy | 22 |
readme.txt | 2 |
Total | 40 |
Leaderboard - Optional
Submit an optimized decode.toy
to the Hamming Leaderboard for class fame and glory. Optimize decode.toy
with the goal of using the fewest (non-zero) words of TOY main memory. Under 40 (in decimal) is very good; under 35 is great; the all-time record is 27. You may work with your partner or by yourself.
Submit to Leaderboard | View Current Leaderboard
Also
- Upload a
nickname.txt
file that contains the name of your group. Please be respectful with the nickname. - Upload a
leaderboard.txt
file that describes your approach.
Enrichment
Error correcting codes are used in CDs. You can drill a 2.5mm hole through it, and it will still play. Only 1/3 of data is music.
Error correcting codes are also widely used on noisy analog channels such as radio links. This enables you to reduce the amount of power needed to transmit signals.
Copyright © 1999–2024, Robert Sedgewick and Kevin Wayne.