Due midnight, Friday, March 5
Tue Feb 23 09:51:30 EST 2010
Base64 encoding provides a way to encode arbitrary binary data into printable ASCII for mailing or shipment to a web browser, then decode it back into exactly its original form. The encoding takes the input 6 bits at a time and maps that into a letter, digit or other character that is guaranteed to travel safely through all systems. This process expands the input by only 33%, so it is acceptably efficient. Decoding reverses the process, taking 8-bit character input back into properly packed 6-bit fields that reproduce the original input exactly.
This assignment is an exercise in writing code to a published standard. The file base64.html contains an excerpt (most of Section 6.8) from RFC 2045, the standard for MIME (Multipurpose Internet Mail Extensions) encoding. Your task is to
Your programs must read from stdin and write to stdout. The programs must not throw exceptions or print any other output. In particular, note the part of the standard that says "Any characters outside of the base64 alphabet are to be ignored in base64-encoded data." Your goal is an encoder that works on all inputs and a decoder that does the right thing on valid inputs like those produced by openssl. Your decoder should accept very long lines, ignore invalid characters, and stop when it encounters one or two = signs or the end of the input. Since there is arguably no true meaning to invalid input, we won't probe too hard at this aspect; just make sure you handle valid input correctly.
Create at least 15 short test files named test.01, test.02, test.03, etc., to be used as inputs. These tests should thoroughly explore critical boundary conditions and other potential trouble spots of the standard and your code. You might find it helpful to read Chapter 6 of The Practice of Programming on testing.
How you generate the tests is up to you, but bear in mind that they must include arbitrary binary data, not just ASCII text. Binary data is easier to generate with a C or Java program than by hand, and of course there's a lot of binary data around. The program od ("octal dump") is useful for looking at binary data; try od -tx1 to see bytes in hex.
My Java encoder and decoder are about 80 lines each, with the encoding and decoding part about 40 lines in each, so if your solutions are a lot bigger, you may be off on the wrong track. Here's a Java help file and Java documentation.
The main complexities of implementation are mapping 3 bytes of input to 4 bytes of output, and vice versa, with some tricky boundary conditions. Keep it simple: this program does not need to be fast or the slightest bit clever. As is true with all programming, trying to be fast and/or clever is often a recipe for disaster.
Talking over the precise meaning of the specification with friends is strongly encouraged; this standard is quite clear and compact compared to most, but still requires careful attention. You must write your code yourself, however, so once you start coding, you may no longer discuss programming issues with other students.
You may exchange compiled Java class files with friends if you like, to ensure that your encoder works with someone else's decoder and vice versa. But only compiled classes; no source, and no test cases. As with previous assignments, I'm not trying to be rigid here, just encouraging you to think about how to test the program.
Prepare good test cases ahead of time. Test early, test often, know what you are testing. Automate the testing as much as you can, with a shell script or a makefile. Part of the job here is to mechanize testing but you will also find it helpful to validate your code on real MIME messages like the ones that encode attachments in mail. Grab those with an editor, put them in a file, decode, and see if the result is acceptable to the target application (Word, Excel, MP3 player, etc.).
You can use the program openssl as a reference implementation:
openssl enc -e -base64 <input >base64output # encode openssl enc -d -base64 <base64input >output # decode
Finally, note that this assignment is also an exercise in bit-whacking. In general, avoid constructs that suggest that you do not understand how to manipulate bits as bits. Don't use decimal numbers for bit patterns or ASCII characters. Thus 0x3F is a bit pattern for selecting the bottom 6 bits, and it's not magic (by my definition); 63 is the decimal equivalent and it has to be mentally translated before you can see the bit patterns, so it's magic. Similarly, 97 is magic while 'A' is not. Shift values of 6, 12 and so on are also not magic in this program, since their meaning is clear in context. Don't be up tight about this, but you really should know proper Java constructs for manipulating bits.
tar cf tests.tar test.*and submit your encode.java, decode.java, and tests.tar with this Dropbox link.
We will test your code by a process somewhat like this:
tar xf tests.tar javac encode.java decode.java for i in your test cases and ours do java encode <$i >x0 # your encoder java decode <x0 >y0 # your decoder cmp $i y0 # compare to original java encode <$i >x1 # your encoder openssl enc -d -base64 <x1 >y1 # our decoder cmp $i y1 # compare to original openssl enc -e -base64 <$i >x2 # our encoder java decode <x2 >y2 # your decoder cmp $i y2 # compare to original doneso you might try a similar script for your own testing. Note that it is not sufficient that your encoder and decoder work together; each must work independently.
Please follow the rules on what to submit. As suggested, we will exercise your programs mechanically, so it's a big help if your submission arrives in the right form, and your programs do only what is asked for. Thanks.