Lexical analyzer & Abstract Syntax
Setup: Make a new directory
as2 for this assignment by unpacking
as2.zip.
Run sml and compile the code using CM.make "sources.cm";
This assignment has two parts. They are almost entirely independent of each other, which means you can work on them in either order.
Part A: Abstract Syntax
Translate (by hand) a short Fun program into abstract syntax constructors.
- Read and understand the following files: absyn.sml test.sml. Notice that in test.sml there are five Fun programs translated by hand into abstract syntax constructors.
- After your CM.make, execute Test.run(); from the sml interactive prompt. This will pretty-print and evaluate all five programs.
- Read the following files, but you're not (yet) responsible for understanding them in detail: heap.sig heap.sml eval.sig eval.sml funpp.sig
- Glance through the following files, but most likely you will never be responsible for understanding them in detail: table.sig table.sml funpp.sml
- Finish myfun.sml by translating the given program into abstract syntax constructors.
Part B: Lexical Analyzer
Build a lexer for the Fun language. In order to do this, you will need to figure out all of the tokens that you need to lex, by reading
The definition of Fun.
You will be building your lexer using ML-LEX. There is online documentation on ML-LEX here. There is also some help in your textbook (Appel, chapter 2).
- Read and understand the following files: sources.cm, errormsg.sml, tokens.sig, tokens.sml, fun.lex, runlex.sml. Only these files are relevant to this part of the assignment.
(Actually, you can get by without understanding the stuff about "('svalue,'pos) token" in tokens.sig. However, do the experiment of writing a little program that just calls Tokens.ARROW(0,0) just to see what it's doing. You can even make this call from the SML/NJ interactive top-level prompt.) (Also, I wouldn't think any less of you if you didn't bother to understand the first four lines of fun.lex; that's just boilerplate for attaching the lexer to the parser.)
- Edit the code in fun.lex. You will have to remove some of the sample code and add a lot of your own.
- The tokens are declared in tokens.sig. Here is a list of symbols that will appear in the source along with the tokens they should be associated with:
-> | ARROW
|
| ! | BANG
|
:= | ASSIGN
|
| )
| RPAREN
|
( | LPAREN
|
| ||
| OR
|
& | AND
|
| = | EQ
|
> | GT
|
| < | LT
|
* | TIMES
|
| - | MINUS
|
+ | PLUS
|
| ; | SEMICOLON
|
, | COMMA
|
| :
| COLON
|
#i | PROJ
| (where i is a nonnegative integer without leading 0's: 0,1,...)
|
Fun keywords should be represented using tokens with the same name. The end-of-file token should be represented using the token EOF.
Each token takes two integers: the line number and column number of the beginning of the token. These are used for error reporting. In the example below, x is on the second line in column 5. Notice, the first row is row 1 and the first column is column 1 (as opposed to 0).
if true then
let x = ...
- Nested comments. This assignment would be a lot easier if the Fun language didn't have nested comments. I recommend that you first get everything working except nested comments. Then add that feature as your time permits.
What to turn in
Submit myfun.sml, fun.lex, mytest.fun as well as a README. In the README, please write,
- Who you got help from, if any, or who you helped with this assignment, or with whom you just discussed and exchanged ideas. Briefly explain the nature of the help or the discussion.
- Explain any design decisions you had to make.
- Briefly explain why mytest.fun is reasonably distinct from test.fun, and highlight some aspects of selected test cases. There's no need to give the full token sequence for all your examples.
- Give any other feedback or comments that you wish to about this assignment.
Upload you submissions to dropbox here.
Caveat: Some of the example code in the ML-Lex documentation is written for an earlier version of SML; for example, the library function "revfold" used in the documentation is now called "foldr".