For this assignment, you will build a lexer for the Fun language. In order
to do this, you will need to figure out all of the tokens that you need to lex.
The definition of Fun is here. This document
should contain definitions of all of the tokens you need to lex. You
should not produced tokens for whitespace.
You will be building your lexer using ML-LEX. There is more online
documentation on ML-LEX
here. There is also some help in your textbook (Appel, chapter 2).
Using ml-lex
Make a new directory as2 for this assignment.
Copy the files from /u/cos320/chap2/ into as2.
This will include *.sml *.sig *.lex and
*.cm files.
Please, familiarize yourself with the code that is already there.
Run sml and compile the code using
CM.make();
Edit the code in fun.lex. You will have to
remove some of the sample code and add a lot of your own. Keep the
following information in mind as you develop your lexer:
The set-up code
uses one additional feature not mentioned in class. It uses the %arg (x)
declaration in the lex section of the lexer. This declaration makes the
lexer take an additional argument. This additional argument is available
in the lexer actions (use the name x). If you use this feature, we
highly recommend changing the name "x" to something more semantically
meaningful. This
additional argument is an integer. We initially pass 0 to the lexer as the
argument. Instead of calling (continue ()) to recursively invoke the lexer in a lexing action you may call (lex i
()) where i is an integer argument to
the lexer. You might find this feature useful in your development of the
lexer, but you do not have to use it.
The tokens are declared in Tokens.sig. Here is a list of symbols that
will appear in the source along with the tokens they should be associated
with:
->
ARROW
!
BANG
:=
ASSIGN
)
RPAREN
(
LPAREN
||
OR
&
AND
=
EQ
>
GT
<
LT
*
TIMES
-
MINUS
+
PLUS
;
SEMICOLON
,
COMMA
:
COLON
#i (where i is a
non-negative
integer without leading 0's: 0,1,...)
PROJ
Fun keywords should be represented using tokens with the same name.
The end-of-file token should be represented using the token EOF.
Each token takes two integers:
the line number and column number of the beginning of the token. These are
used for error reporting. In the example below, x is on the second line in
column 7. Notice, the first row is row 1 in the first column is column
1 (as opposed to 0).
if true then let x = ...
Note that depending on your code structure, you may not be able to provide
the column number for the EOF token. In this case, just use the line number
and 0.
Use ErrorMsg.error : int -> string -> unit to report errors. It takes
two arguments: the character position from the beginning of the file where
the error occured, and the error message to print out. It uses lineNum and
linePos to compute the position of the error, so these must be updated as
necessary. You will need to use the function newLine (supplied in
fun.lex, user declarations section) to keep lineNum and linePos properly
updated.
Be careful with your syntax in lex files. Remember that each lex
definition must end with ";" and each lexing rule must also end with ";".
If you forget ";" then ML-Lex will complain. For example, the
following is correct:
To test your code, run "RunLex.runlex("test.fun");" It will output
a sequence of tokens along with the line and column number of each token.
As always, a single test is far from complete. You will want to write
your own test cases and thoroughly test your lexer.
You also
might notice that the file fun.sml has been extended. The set of types now
includes reference types and there are operations to create new references and
to get and set references. As an optional exercise, you can extend the
interpreter from last week to handle these references. There is also an
example test function in test.sml.
Submit all your files including a README
that explains any decisions you had to make. Use submit 2 <all files>.