COS 126 Lecture 15: Finite-State Automata

NOTE: See also Notes on Formal Language

slide 1 - Regular Expression

Check out the languages handout for more details and many sample problems (with solutions). There is a link from the 126 FAQ page.

Here's a recursive definition of what a regular expresions is:

i.) base cases:

ii.) if a and b are regular expressions (for our alphabet), then so are:

Regular expressions define a language (not uniquely.) Given a regular expression, you can ask whether a particular string is in that language. For each of the given examples, I've listed a few of the strings which are in the language and a few that are not.


slide 2 - Formal Languages

For each of the examples given, how would you write the corresponding regular expression?

  1. all bit strings that begin with 0 and end with 1
  2. all bit strings whose number of 0's is a multiple of 5
  3. all bit strings with more 1's than 0's
  4. all bit strings with no consecutive 1's

ANSWERS


slide 3 - Finite State Automata

Finte state automata (FSA) are very simple machines to build. However they can do many interesting things. In all likelihood, every modern appliance (dishwasher, microwave, VCR) has one (or many) FSA machines embedded.

Each FSA has a fixed number of states (hence the 'finite' part of the name.) There is always a start state. One or more states may be designated as 'accept' states. (The start state may be an accept state.) An FSA accepts a given input string if and only if you are 'in' an accept state when you finish reading the string. How do you use an FSA to read a string?

You read the input string one character at a time (left to right) and start in the 'start' state of the FSA. That start state will have an incoming arrow not connected with any other state. (Let's assume the FSA is "deterministic" - I'll explain what this means, shortly.) In a deterministic FSA, there will be exactly one arrow for each possible input character. So, for the binary alphabet we've been using, there will be two arrows coming out of each state - one labeled '0,' the other labeled '1.' For concreteness, let's assume that the first digit of the string is a 0. We'll read this 0 and move to the state that the '0' arrow points to. Now, read the next character in the input string and follow the corresponding arrow to the next state. Continue like this until you have read all the characters in the input string. If you 'end up' in an accept state, we say that the FSA recognizes the input string, or, equivalently, that the input string is part of the language described by the FSA. (This is just like saying that a string is in a language defined by a regular expression.)

If you look at the examples in this lecture slide, you can see the order of the states encountered (or 'traveled through') as we read the input string. The states in this example are numbered. By default, state 0 is the start state. Accept states are indicated by double circles.

You should learn to figure out what language a particular FSA represents. The first example recognizes the language of all strings consisting of one or more 10's concatentated together (10, 1010, 101010,...) The second example recognizes the language with an odd number of 0's. If you're stuck, try out a few sample input strings and look for a pattern. Also, always be sure to check whether the empty string is in the language.


slide 4 - An application

The application presented in this slide is used to remove ``noise'' from digitized video or DVDs. On each arrow, the first number represents a character read in the input string. The second number (after the /) is the output generated. The chart on the right shows the output generated when the FSA is applied to the input string: 01010101 - the string 00011101 is generated. Isolated 0's and 1's within the string are eliminated. Notice that this FSA variant doesn't have an accept state. It does not recognize a language; instead it reads in a bit string and outputs a bit string. This example demonstrates how powerful even a simple FSA can be.

Pay particular attention to the state interpretations chart. If you're asked (on an exam, for example) to draw an FSA representing a given language, it's good to try to formuate the meaning of each state.

In the first example on slide 3, here's the interpretation of each state:

0: nothing read so far
1: a 1 was just read
2: two consective 1's read, OR two consecutive 0's read OR a 0 was read initially
3: a 0 was read after state 1
   i.e., a 0 was read after a 1 was read

How do the state interpretations help you understand the FSA as a whole? State 3 is the accept state, because only strings with the pattern (10)*10 will end up in this state. State 2 has arrows ("transitions" is the correct terminology) for both 0 and 1 returning to it. This corresponds to the fact that once you've reached state 2, you've read input characters which are not in the language represented by the regular expression (10)*10. So, you can never get to the accept state. In state 1, you've read strings of the form (10)*101. For an input string to be in the language, you must read a 0 next, and go to state 3.

Question: what are the interpretations of the states in the second example on slide 3?


slide 5 - C program to Simulate FSAs

This C program simulates FSAs that operate on the binary alphabet: 0 and 1. The C program opens a file representing the FSA. (The file name is passed as a command line argument. See K&R section 5.10) The format of the file is a list of states, one line per state. For each state (line of the file), the destination state is given for each of the two possible inputs, 0 and 1. So, each line in the file will have two numbers. The first number corresponds to the '0 arrow.' The second number corresponds to the '1 arrow.' The value of the numbers is the number of the state the arrow points to. For the example given, the file will look like this:

0 1
2 0
1 2

Then, an input string is read from standard input. To run this program, you would type something like:

a.out FSA_filename 11101010

In each iteration of the while loop, the value of state is updated based on the current state and whether a 0 or a 1 is read. At the end, if the state is 0 (the accept state), Accepted is printed. Otherwise, Rejected is printed. You should try tracing through the code while tracing through the FSA to make sure you understand it.


slide 6 - A language that is not regular

It's good to remember that regular languages can't ``count'', i.e., that can't distinguish between bitstrings that have an equal number of 0's and 1's, and those that don't. A proof by contradiction is given.


slide 7 - Pushdown Automata

Think of a Pushdown Automata (PDA) as an FSA + a stack. The stack stores extra information - it is the "memory" added to the FSA. In an FSA, the next state depended only on the current state and whether the input bit was a zero or a one. In a PDA, the next state will depend on the current state, the input bit, AND the value of the bit on the top of the stack. Also, in addition to 'moving' to the next state, we can also update the stack by either POPping the top item off the stack or PUSHing the current input bit onto the stack.

Each arrow on the PDA has a 3 part value on it instead of just the input bit (like in an FSA.) The first part is the input bit (exactly the same as the FSA.) The second part is the value of the bit on the top of the stack. Instead of an ACCEPT state, we accept if the stack is empty, when we finish reading the string. If we ever try to POP an empty stack, we crash, meaning that the string is not in the language (rejected.)

In the example given, we use the stack to hold the extra 0's or 1's. If, for example, we read 10111..., we'll use the stack to hold the extra 1's until enough 0's come along to POP them off the stack.


slide 8 - Nondeterministic Machines

A defining feature of the FSAs we've looked at so far is that they've all been deterministic. The American Heritage Dictionary defines determinism as "The philosophical doctrine that every event, act, and decision is the inevitable consequence of antecedents that are independent of the human will." In computer science, we use the adjective "deterministic" to describe machines, or automata in which decisions are forced. There is no real choice. In the FSA we've looked at, we never had to choose which state to go to next: we just followed the only arrow (transition) which was designated for the given input bit. In a nondeterministic FSA, instead of one arrow for each input bit, there may be 0 or more arrows. If there are more than one arrow, we need to 'choose' which arrow to follow. How can we interpret nondeterministic FSA?

The basic concept is the same. We read the input string one bit at a time and move to the next state accordingly. If there is no arrow for the input bit, we have 'crashed.' The input string is rejected. If there is only one arrow for the input bit, follow it to the next state (just like in a deterministic FSA.) If there is more than one, just choose one, follow it and continue with the next input bit. Here's the tricky part: we say that the string is recognized by the FSA (accepted) if there is any way to get to an accept state. Let's look at the example given on the lecture slide:

01 -we start in state 0. then, we have the choice of going to state 1 or state 2. If we choose state2, we then stay in state 2 when we read the '1.' (result=rejected.) However, if we choose state 1, we move to state 3 when we read the 1, so the result is ACCEPT. Since there is a way to end up in the accept state, 01 is recognized by this FSA

0111110101 - the sequence of states that leads to the accept state is: 0,1,3,2,2,2,2,0,0,1,3

01000010110 - the sequence of states that leads to the accept state is: 0,1,3,1,1,1,1,3,1,3,2,3

01000 - there is no sequence of states that leads to the accept state. We start in state 0, stay in state 0, and then if we move to state 1, we'll stay there and never leave. If we instead choose to move to state 2, we still won't end up in state 3...


slide 9 - Nondeterminism doesn't help in FSAs

This slide demonstrates a method for converting a nondeterministic FSA to a deterministic one. This is the procedure:

If there are N states, list all the N digit binary numbers. Each binary number represents some combination of the states. 0000 represents none of the states. 0110 represents states 1 and 2 (not 0 and 3.) 1110 represents states 0 and 1 and 2. etc. Each of these binary permutations will represent a state in our deterministic FSA (but we might not use all of them.) For each new state, you need to determine its 'arrows.' That is, you need to determine what the next state will be for a '0' input and a '1' input. To make it easier, we'll use the decimal representation of each of our binary numbers to indicate the number of the corresponding state.

It's not as complicated as it sounds, but it does take a while. Let's start constructing the table in the example:

0001 - in the n-FSA, state 3 had a 0-arrow going to state 1, and a 1-arrow going to state 2. The binary representation of state 1 is 0100. The binary representation of state 2 is 0010. So, the 0-arrow for our state 1 (0001) will go to 4 (0100) and the 1-arrow will go to 2 (0010).

0010 - in the n-FSA, state 2 has a 0-arrow going to states 0 (1000) and 3 (0001), and a 1-arrow going to state 2 (0010). So, in the deterministic FSA, our state 2 (0010) will have a 0-arrow going to state 9 (1001), and a 1-arrow going to state 2. Notice that state 9 represents the combination of states 0 and 3 - (1001).

0011 - this state (3) represents the combination of states 2 and 3. So, we need to look at the transitions (arrows) of both states in the n-FSA. For the 0-arrow, the destinations are states 1 and 0 and 3. For the 1-arrow, the destinations are 2 for both states. In the deterministic FSA, then, for state 3, we'll have a 0-arrow going to state 13 (1101) (wrong on the slide) and the 1-arrow going to state 2 (0010).

If any state doesn't have any arrows for a particular input bit in the n-FSA, we'll make the corresponding arrow go to state 0 in the deterministic FSA.

Notice that each state in the deterministic FSA will have exactly 2 arrows going from it - one for 0 and one for 1. If any states in the deterministic FSA we create don't have any arrows going to it (besides the start state), we can eliminate those states.

One more thing: we haven't specified which states are the start and accept states. The start state will remain the same. If state 0 was the start state, then state 8 will be the new start state (1000). Any states in the new FSA which include an accept state as part of their combinations of states will be an accept state. In our example, if state 3 (in the n-FSA) is the accept state, then the accept states will be 1 (0001), 3 (0011), 5 (0101), 7 (0111), 9 (1001), 11 (1011), 13 (1101), and 15 (1111).

You can probably see that this would take a long time to do if the FSA has many states. The most important thing is that it it can be done at all.


slide 10 - FSAs are equivalent to REs

This slide is an overview of a proof demonstrating that FSAs and REs are equivalent. The proof that NFSAs and FSAs are equivalent is by construction: given any NFSA, we can construct an equivalent FSA. This slide shows that for any RE, we can construct an NFSA. One thing I haven't mentioned yet, which is important for understanding this slide is that in N-FSA you can have transitions between states which correspond to the null character (neither 0 or 1). You have the choice of following these arrows before reading the next character in the input string. The connecting lines in the rules listed are these null-arrows.

For example, for the simple regular expression: 10 + 1

(sorry, again, no graphic yet) the NFSA will be:

             state 1 ---> state 2
           /          1          \0
start state                       accept state 
           \            1        /
            state 3 -------------

OK, let me explain my picture. Coming out of the start state are two arrows (pretend), neither of which as any input bit associated with it. Initially, you can just choose which one to follow. Going the top route, you need to read a 1 followed by a 0 to get to the accept state. Going the bottom direction, you need to read a 1 to get to the accept state. Anything more than this simple example would be too hard to draw.


slides 11 & 12 - Nondeterminism does help in PDAs, but not Turing Machines

The main thing to remember is that non-deterministic FSAs and deterministic FSAs are equivalent. (Given a non-deterministic FSA, you can build a deterministic FSA to accept exactly the same strings, and vice versa.) In cotrast, non-deterministic PDAs are more powerful than deterministic ones; that is, you can recognize a broader class of languages with a nondeterministic PDA. Turing machines can recognize an even more broad class of languages.

Turing machines are the same type of thing that FSAs and PDAs are. You can still think of reading an input string and 'moving' through the states accordingly. Like the PDAs, you have additional information to consider. Instead of a stack (which only allows access to the topmost element), you have a 'tape.' Think of the tape as a long strip of paper onto which you can write. The initial input is written on the tape so that you can read it. Depending on your current state and what you read, you can write a bit, move left or right, and move to a new state.

A Turing Machine has no restrictions on the tape. You can even think of having multiple tapes, if you need them (but you don't.) This type of automata is so powerful that non-determinism doesn't enable it to accept a broader class of languages. (However, non-determinism probably enables a Turing machine to recognize a given language faster. A precursor to the NP-completeness Lecture 24....)