A programming language is a formal language used to communicate algorithms both from programmer to programmer and from programmer to machine. A formal language consists of:
Some general purpose programming languages include C, C++, PASCAL, and Ada. Some special languages are TEX, Post-Script, JAVA Byte-Code, TCP/IP, and perhaps the WIN32S API.
To use a programming language effectively we must study and understand it from three perspectives:
Here are three ways of expressing "increment the i-th element of array x" in different programming languages.
(a) x[i] = x[i] + 1; [C] (b) (vector-set! x i (+ (vector-ref x i) 1)) [Scheme] (c) x[i] = x[i] + 1; [Java]
These expressions have approximately the following semantics.
(a) if i in bounds of x then x[i] <- (x[i] + 1) mod 2^32 else who knows? (b) if x is not a vector then ERROR else if i is not an integer then ERROR else if i is not in bounds of x then ERROR else if x[i] is not an integer then ERROR else x[i] <- x[i] + 1 (c) if i is not in bounds of x then ERROR else x[i] <- (x[i] + 1) mod 2^32Despite the apparent similarity of the C and Java expressions, the Java expression semantics is closer to that of Scheme than C.
Now consider expressing "increment x[0] thru x[N]". In C we write:
for( i = N; i >= 0; i-- ) x[i] = x[i] + 1;In Scheme we write something rather different:
(define natural-foreach (lambda (f n) (cond ((>= n 0) (begin (f n) (natural-foreach f (- n 1))))))) (define inc-x (lambda (i) (vector-set! x i (+ (vector-ref x i) 1)))) (natural-fold inc-x N)Finally in Java, we probably write something that looks the same (has the same syntax) as in C. C/Java pragmatics suggest the use of iteration, while in Scheme we use of recursion.
Most programming language courses survey a variety of programming languages, covering syntax mostly, with only a short time left for semantics. Instead, we will only use Scheme, which will allow us to quickly move onto semantic issues. We will use definitial interpreters and spend a little time looking at pragmatic issues.
This course will NOT teach you:
But it will teach you:
+
, a+b
, and -a*2
are all identifiers.
Aside: Comments are usually discarded by a language processor during lexical analysis; that is, while the language processor is converting the stream of input characters into a stream of tokens. Scheme's comments begin with a semicolon and extend to the end of the line.
Query = { w1 ... wN | w1,...,wN in Word } U { NOT q | q in Query } U { (q1 AND q2) | q1, q2 in Query }For defining the context free syntax of programming languages, we often use a special language that is more concise. It is called BNF (Backus Naur Form):
Query ::= Word * | NOT Query | (Query AND Query)In BNF terminals, or tokens, are symbols that do not appear on the left of the ::= operator. In the example above, AND, (, ), and NOT are the terminals. Query is the only non-terminal. Well, almost. Word is really a non-terminal that we haven't bothered to define.
BNF can only describe context free languages. The following set of terms is impossible to describe in BNF.
Kwery = { w1 ... wN | w1,...,wN in Word } U { NOT q | q in Kwery } U { (q1 AND q1) | q1 in Kwery }An AND-Kwery requires both its arms be the same. This set of terms is context-sensitive: selecting a Kwery to place in the hole of the term
(q1 AND [])
(where [] denotes a hole)
cannot be done without knowing the context surrounding the hole.
Specifically, we must know what q1 is, because to get a valid Kwery
we can only place q1 in the hole.