COS 320, Spring 2000. Programming Assignment

Programming Assignment 2: Lexical Analyzer

For this assignment, you will build a lexer for the Fun language. In order to do this, you will need to figure out all of the tokens that you need to lex. The definition of Fun is here. This document should contain definitions of all of the tokens you need to lex. You should not produced tokens for whitespace.

You will be building your lexer using ML-LEX. There is more online documentation on ML-LEX here. There is also some help in your textbook (Appel, chapter 2).

Using ml-lex

Make a new directory as2 for this assignment.
Copy the files from /u/cos320/chap2/ into as2. This will include *.sml *.sig *.lex and *.cm files.
Please, familiarize yourself with the code that is already there.
Run sml and compile the code using CM.make();

Edit the code in fun.lex. You will have to remove some of the sample code and add a lot of your own. Keep the following information in mind as you develop your lexer:

The set-up code uses one additional feature not mentioned in class. It uses the %arg (x) declaration in the lex section of the lexer. This declaration makes the lexer take an additional argument. This additional argument is available in the lexer actions (use the name x). If you use this feature, we highly recommend changing the name "x" to something more semantically meaningful. This additional argument is an integer. We initially pass 0 to the lexer as the argument. Instead of calling (continue ()) to recursively invoke the lexer in a lexing action you may call (lex i ()) where i is an integer argument to the lexer. You might find this feature useful in your development of the lexer, but you do not have to use it.

The tokens are declared in Tokens.sig. Here is a list of symbols that will appear in the source along with the tokens they should be associated with:

->	ARROW
!	BANG
:=	ASSIGN
)	RPAREN
(	LPAREN
\|\|	OR
&	AND
=	EQ
>	GT
<	LT
*	TIMES
-	MINUS
+	PLUS
;	SEMICOLON
,	COMMA
:	COLON
#i (where i is a non-negative integer without leading 0's: 0,1,...)	PROJ

Fun keywords should be represented using tokens with the same name. The end-of-file token should be represented using the token EOF.

Each token takes two integers: the line number and column number of the beginning of the token. These are used for error reporting. In the example below, x is on the second line in column 7. Notice, the first row is row 1 in the first column is column 1 (as opposed to 0).

if true then
let x = ...

Note that depending on your code structure, you may not be able to provide the column number for the EOF token. In this case, just use the line number and 0.

Use ErrorMsg.error : int -> string -> unit to report errors. It takes two arguments: the character position from the beginning of the file where the error occured, and the error message to print out. It uses lineNum and linePos to compute the position of the error, so these must be updated as necessary. You will need to use the function newLine (supplied in fun.lex, user declarations section) to keep lineNum and linePos properly updated.

Be careful with your syntax in lex files. Remember that each lex definition must end with ";" and each lexing rule must also end with ";". If you forget ";" then ML-Lex will complain. For example, the following is correct:

....
fun make_pos yypos yytext = ...

%%

alpha = [A-Za-z];
....

%%

<INITIAL>fun => (Tokens.FUN(make_pos yypos yytext));

....
To test your code, run "RunLex.runlex("test.fun");" It will output a sequence of tokens along with the line and column number of each token. As always, a single test is far from complete. You will want to write your own test cases and thoroughly test your lexer.

You also might notice that the file fun.sml has been extended. The set of types now includes reference types and there are operations to create new references and to get and set references. As an optional exercise, you can extend the interpreter from last week to handle these references. There is also an example test function in test.sml.
Submit all your files including a README that explains any decisions you had to make. Use submit 2 <all files>.